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In  very  large  distributed  computer  systems,  there  are  significant  problems 
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control  systems  can  be  built  based  on  a  set  of  seven  design  principles  which  we 
describe.  We  also  apply  these  principles  to  the  problem  of  decentralized  load 
balancing,  and  provide  results  based  on  trace-driven  simulation  experiments. 

Our  approach  is  knowledge-based,  by  which  we  mean  that  an  agent  will 
make  use  of  heuristics  and  domain-specific  knowledge  about  the  behavior  of 
itself  and  other  agents  to  make  good  decisions.  A  powerful  technique  we 
present  is  one  that  agents  use  to  quantify  the  uncertainty  of  information  they 
have,  and,  based  on  these  quantifications,  to  make  better  decisions.  Agents 
adapt  their  decisionmaking  to  changing  conditions  by  observing  the  system  at 
infrequent  (to  minimize  communication  overhead)  and  opportune  times,  and 
then  relying  on  their  inference  capabilities  between  observations.  To  minimize 
the  occurrence  of  mutually  conflicting  decisions,  we  introduce  a  technique 
called  SPACE/TIME  Randomization,  which  provides  implicit  coordination  of 
agents  and  requires  minimal  communication.  The  solutions  we  present  are 
based  on  a  combination  of  extensions  of  decision  theoretic  techniques  and 
artificial  intelligence  techniques. 
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CHAPTER  1 


INTRODUCTION 


A  distributed  computer  system  [DaviSl]  [Ensl78]  with  decentralized  resource  con¬ 
trol  [Abra80]  [Jens78]  is  a  collection  of  agents  which  reside  on  a  (geographically)  dis¬ 
tributed  set  of  computers  and  which  control  resources  so  that  work  can  be  carried  out 
in  an  integrated  fashion.  In  this  dissertation,  we  investigate  the  problem  of  designing 
such  systems.  A  key  feature  is  that  control  of  all  resources  is,  to  varying  degrees,  dis¬ 
tributed  amongst  the  agents,  i.e.,  decentralized.  The  goal  is  to  find  a  way  for  the 
agents  to  coordinate  their  actions  to  maximize  some  index  of  system  performance. 
(Our  main  interest  lies  in  performance  optimization.) 

The  systems  we  are  exploring  have  special  characteristics.  Every  resource  belongs 
to  a  particular  agent,  that  is,  it  is  directly  controlled  by  that  agent.  An  agent  may 
also  accept  requests  from  other  agents  for  the  use  of  its  resources.  So,  in  a  sense,  all 
agents  have  an  indirect  control  over  all  resources,  by  having  (varying  degrees  of) 
access  to  them.  Although  agents  can  act  autonomously,  we  are  interested  in  making 
them  cooperate.  In  fact,  we  limit  the  scope  of  this  research  to  cooperative  systems 
where  agents  will  not  act  maliciously.  The  goal  is  to  determine  how  to  make  agents 
cooperate  effectively,  given  that  they  are  willing  to  do  so. 

The  systems  of  interest  have  a  large  number  of  agents  and  resources,  at  least 
tens,  but  more  likely  hundreds  or  even  thousands  of  them.  This  means  that,  at  any 
point  in  time,  it  is  likely  that  there  will  be  many  simultaneous  requests  for  resources, 
and  that  there  will  be  many  resources  from  which  to  select. 

We  are  concerned  with  an  agent’s  main  activity,  decisionmaking.  Agents  must 
decide  when  to  make  use  of  resources,  and  which  resources  are  most  beneficial  at  that 
time.  Consequently,  agents  are  interested  in  the  states  of  resources,  which  they  can 
determine  by  observation  through  communication.  Since  communication  takes  time, 
all  received  state  information  is  delayed  by  transmission  time.  This  situation  is 
acceptable  because  the  state  information  which  agents  use  to  base  their  decisions  does 
not  have  to  be  perfect.  In  fact,  bad  decisions  are  tolerable,  so  long  as  they  do  not 
occur  very  often.  This  is  in  contrast  to  database  control  problems,  where  data 
integrity  requirements  are  very  stringent. 

A  major  constraint  is  that  the  agents  have  real-time  response  requirements.  The 
time  it  takes  to  make  a  decision,  along  with  its  expected  consequences,  is  a  cost  which 
must  be  taken  into  account.  This  rules  out  solutions  which  involve  mathematical  pro¬ 
gramming  or  exhaustive  searches.  Agents  must  make  reasonably  good  decisions  in  a 
reasonably  short  amount  of  time,  where  what  is  "reasonable”  depends  on  the  problem. 
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Why  is  it  important  to  consider  decentralized  control  systems?  The  current 
trend  of  computer  systems  is  toward  larger  numbers  of  loosely  connected  processing 
nodes  [Ande87].  Much  progress  has  been  made  in  addressing  the  communication 
problem  in  such  systems,  specifically,  the  establishment  of  network  protocols  for  infor¬ 
mation  transmission  [TaneSO].  This  has  heightened  the  potential  for  the  effective 
sharing  of  all  resources  by  all  agents  (for  which  there  still  remains  much  work  in 
establishing  adequate  protocols).  To  realize  this  potential,  we  need  to  develop  good 
solutions  for  the  coordination  problem,  specifically  for  the  harmonious  interaction  of 
all  agents  in  the  sharing  of  all  the  resources. 

There  are  significant  problems  when  one  considers  decentralization  of  control. 
Probably  the  most  difficult  is  that  agents  have  limited  knowledge  about  the  state  of 
other  agents  and  resources.  They  must  make  decisions  based  on  partial  information, 
which  is  often  out-of-date  and  sometimes  incorrect.  Further,  communication  is  not 
free.  Agents  must  be  prudent  in  deciding  when  to  request  for  updated  state  informa¬ 
tion.  Agents  must  also  work  within  real-time  constraints.  They  must  realize  the 
tradeoffs  between  the  quality  of  a  decision  and  the  time  it  takes  to  achieve  that  qual- 

ity. 

In  developing  solutions  for  these  problems,  the  following  philosophy  will  be 
adopted.  Agents  will  seek  to  infer  state  information  about  resources  and  other  agents 
rather  than  relying  solely  on  explicit  communication,  which  will  be  done  frugally 
These  inferences  will  be  based  on  knowledge  of  behavioral  models  of  the  resources  an 
of  the  other  agents.  These  models  are  either  acquired  by  the  agent’s  ability  to  observe 
and  summarize  past  behavior,  or  by  the  initial  programming  of  a  human  expert,  or  by 
both  methods.  Coordination  will  also  be  achieved  through  implicit  communication 
(e.g.,  information  sharing  and  inference)  rather  than  relying  solely  on  explicit  com¬ 
munication.  The  reader  will  note  that  a  major  tenet  of  this  philosophy  is  to  replace 
communication  with  local  computation  (e.g.,  inferencing)  whenever  possible.  This  is 
of  critical  importance  when  the  systems  under  consideration  are  very  large,  and  the 
number  of  messages  required  for  global  communication  (e.g.,  broadcasting)  is  unrea¬ 
sonably  high,  causing  excessive  overhead. 

1.1.  Motivations  for  Decentralized  Control  Systems 

Let  us  consider  more  deeply  the  question  of  why  the  study  of  decentralized  con¬ 
trol  systems  is  important.  In  particular,  why  should  decentralized  control  be  preferred 
to  a  centralized  scheme?  We  will  argue  that  the  potential  benefit  of  decentralized 
over  centralized  control  is  so  great  that  it  outweighs  the  difficult  problems  it  poses. 
And  yet,  these  problems  have  caused  many  designers  to  resort  to  centralized  schemes, 
as  they  lacked  methods  for  dealing  with  them.  So,  we  shall  now  review  the  potential 
benefits  of  decentralized  control.  Afterwards,  we  will  consider  in  detail  what  are  the 
corresponding  problems. 
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Complexity  Management 

Consider  a  large  distributed  computer  system,  one  with  hundreds  or  thousands  of 
agents.  It  is  clear  that  the  global  control  of  such  a  system  in  real  time  is  too  complex 
for  any  single  agent.  There  are  just  too  many  objects  to  monitor  and  control  simul¬ 
taneously.  Since  a  single  controlling  agent  must  receive  information  from  and  send 
commands  to  all  other  agents,  bottlenecks  are  likely  to  occur  on  the  communication 
paths  leading  to  this  agent.  Division  of  labor  amongst  many  cooperating  agents  is  one 
way  of  managing  this  complexity.  Ideally,  control  is  distributed  so  that  each  agent 
accepts  part  of  the  burden  of  control  and  contributes  to  the  effective  global  control  of 
the  system.  Moreover,  control-communication  paths  are  short  and  uniformly  distri¬ 
buted  throughout  the  system,  minimizing  the  likelihood  of  communication 
bottlenecks. 

Speed 

When  control  is  distributed  among  many  agents,  multiple  decisions  are  made  in 
parallel,  which  offers  a  potential  increase  in  system  performance.  Control  decisions 
can  be  made  near  the  objects  they  are  to  affect;  thus,  communication  of  state  infor¬ 
mation  to  the  decisionmaker  and  communication  of  commands  travel  short  distances, 
thereby  without  appreciably  reducing  speed  (i.e.,  system  throughput  and  responsive¬ 
ness)  . 

Reliability 

A  major  reason  for  designing  distributed  systems  is  the  reliability  they  offer.  By 
using  distribution  and  redundancy  of  control,  single  points  of  failure  which  are  charac¬ 
teristic  of  centralized  control  systems  are  avoided.  The  potential  benefits  in  speed  dis¬ 
cussed  above  offer  the  possibility  of  using  more  sophisticated  reliability  (fault  detec¬ 
tion  and  recovery)  algorithms. 

Scalability 

In  a  fully  decentralized  control  scheme,  all  agents  may  follow  the  same  control 
algorithm;  consequently,  the  distribution  of  control  is  symmetric  amongst  all  the 
agents.  Scaling  the  system  up  to  comprise  a  larger  number  of  agents  is  indeed  simpler 
than  if  control  were  distributed  asymmetrically  amongst  the  agents.  Asymmetric  dis¬ 
tributions  of  control  (e.g.,  master/slave  relationships)  force  an  a  priori  grouping  of 
agents,  which  must  necessarily  change  when  more  agents  are  added.  A  symmetric  dis¬ 
tribution  of  control  imposes  no  a  priori  grouping.  Instead,  natural  groupings  wfill 
evolve  out  of  necessity  (i.e.,  an  agent  will  be  likely  to  impose  more  control  on  objects 
in  its  immediate  vicinity,  than  on  those  which  are  distant). 

Autonomy 

Finally,  an  important  positive  characteristic  of  decentralized  control  is  that 
agents  can  act  autonomously.  Each  agent  is  in  full  control  of  itself,  and  need  not 
necessarily  rely  on  other  agents.  Control  is  then  shared,  not  by  imposition,  but  rather 
because  it  is  in  each  agent’s  interest  to  do  so.  This  is  important  in  considering  present 
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(and,  even  more,  future)  distributed  systems,  whose  parts  are  often  owned  by  different 
organizations.  These  organizations  are  generally  not  willing  to  give  up  power  over 
their  machines,  but  find  it  reasonable  to  share  power  because  it  is  in  their  best  interest 

to  do  so. 

1.2.  The  Problems  of  Decentralized  Control 

Although  decentralized  control  offers  complexity  management,  speed,  reliability, 
scalability,  and  autonomy,  there  are  associated  problems  which  have  no  apparently 
simple  solutions;  rather,  they  seem  insurmountable. 

Consider  the  mind-set  of  an  agent  taking  part  in  a  decentralized  control  scheme. 
It  is  summed  up  succinctly  in  the  phrase: 

think  globally,  act  locally. 

An  agent’s  control  decisions  should  be  based  on  the  global  system  state.  In  turn, 
an  agent’s  local  decisions,  taken  as  a  component  of  all  the  concurrent  decisions  made 
by  all  participating  agents,  will  affect  the  global  system  state.  Clearly,  these  decisions 
should  be  harmonious,  and  not  mutually  destructive. 

And  so,  we  encounter  the  first  fundamental  problem  of  decentralized  control: 
each  agent  is  uncertain  about  the  current  global  system  state.  Information  about  the 
global  system  state  is  distributed.  Since  pieces  of  such  information  take  varying 
degrees  of  time  to  be  received  by  any  single  agent,  no  agent  can  ever  know,  wuth  com¬ 
plete  certainty,  the  current  global  system  state.  At  best,  an  agent  can  determine  a 
past  global  state,  but  not  the  current  one.  And  yet,  agents  must  base  their  decisions 
on  what  they  believe  to  be  the  current  global  state. 

There  is  a  second  fundamental  problem  of  decentralized  control:  each  agent  is 
uncertain  about  the  current  actions  of  all  other  agents.  Since  an  agent  does  not  know 
what  other  agents  believe,  it  cannot  predict  what  they  will  do.  And  yet,  a  goal  of  all 
the  agents  is  to  make  harmonious  decisions. 

Finally,  as  if  these  problems  were  not  difficult  enough,  there  is  the  constraint 
imposed  on  all  agents  that  they  make  fast  decisions.  The  time  used  to  make  decisions 
has  a  cost,  and  must  be  minimized.  These  decisions,  which  are  complex  because  they 
must  take  decentralization  into  account,  are  generally  more  costly  in  time  than  deci¬ 
sions  under  a  centralized  control  scheme. 

In  summary,  agents  must  make  decisions  based  on  global  state  information  of 
which  they  are  uncertain;  agents  are  uncertain  of  each  other’s  actions  and  yet  must 
act  harmoniously;  agents  are  under  time  pressure  to  make  their  decisions,  which  must 
therefore  be  not  just  good,  but  also  fast.  Thus,  we  are  challenged  with  the  question: 
can  the  benefits  of  decentralized  control  really  outweigh  the  difficulties?  Our  thesis  is 
that  this  question  can  be  answered  in  the  affirmative.  In  the  next  section,  we  will  out¬ 
line  a  formula  for  the  solution,  which  is  the  main  subject  of  this  dissertation. 
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1.3.  Formula  for  a  Solution 

Underlying  the  solution  we  adopt  to  deal  with  the  problems  presented  above  is 
the  idea  that,  in  all  real  systems,  there  are  patterns  in  their  behavior.  If  only  these 
patterns  could  be  recognized  and  encoded,  they  could  be  used  for  predicting  future 
events.  In  the  case  of  decentralized  control  systems,  knowledge  of  these  patterns  and 
knowledge  of  recent-past  state  information  could  be  used  to  infer  the  current  system 
state. 

The  solution  we  present  is  a  knowledge-based  solution,  by  which  we  mean  that  an 
agent  will  make  use  of  heuristics  and  domain-specific  knowledge  about  the  behavior  of 
itself  and  other  agents  to  make  good  decisions.  A  powerful  technique  we  present  is 
one  the  agents  can  use  to  quantify  the  uncertainty  of  information  they  have,  and, 
based  on  these  quantifications,  to  make  better  decisions.  Finally,  agents  adapt  their 
decisionmaking  to  changing  conditions  by  observing  the  system  at  infrequent  (to 
minimize  communication  overhead)  and  opportune  times,  and  then  relying  on  their 
inference  capabilities  between  observations.  The  solutions  we  present  are  based  on  a 
combination  of  extensions  of  decision  theoretic  techniques  and  artificial  intelligence 
techniques. 

We  proceed  as  follows.  In  Chapter  2,  past  work  in  related  areas  is  discussed.  In 
Chapter  3,  a  formal  model  of  decentralized  control  is  presented.  In  Chapter  4,  the 
principles  and  techniques  for  dealing  with  the  problems  of  decentralized  control  are 
introduced.  In  Chapter  5,  we  discuss  the  application  of  these  principles  to  a  particu¬ 
lar  decentralized  control  problem,  namely,  load  balancing.  In  Chapter  6,  we  present 
the  results  of  load  balancing  experiments  to  illustrate  the  feasibility  of  our  solutions. 
In  Chapter  7,  we  summarize  our  conclusions. 


CHAPTER  2 


PAST  RELATED  WORK 


In  this  chapter,  we  first  explore  the  theoretical  foundations  upon  which  this 
dissertation  is  based.  Next,  we  discuss  a  number  of  areas  of  research  which  are  closely 
related  to  our  work.  Finally,  we  review  tw’o  relevant  experimental  studies  which  were 
sources  of  inspiration  for  this  dissertation. 

2.1.  Foundations:  Decision  Theory 

According  to  Blackwell  and  Girshick,  "decision  theory  applies  to  statistical  prob¬ 
lems  the  principle  that  a  statistical  procedure  should  be  evaluated  by  its  conse¬ 
quences..."  [Blac54].  This  principle  is  rooted  in  Neyman  and  Pearson’s  theory’  of 
hypothesis  testing  [Neym33],  and  was  extended  to  all  statistical  problems  by  Wald 
[Wald50]. 

Decision  theory  attempts  to  provide  a  model  for  making  optimal  decisions  based 
on  uncertain  information.  The  uncertainty  of  information  is  quantified  by  a  probabil¬ 
ity  measure  on  the  possible  values  of  variables,  and  optimality  depends  on  the  evalua¬ 
tion  of  possible  outcomes,  as  modelled  by  utility  theory.  The  structure  of  a  decision 
theoretic  problem  is  to  select  a  decision  which  maximizes  the  expected  utility  over  all 
possible  consequences. 

One  of  the  problems  with  a  decision  theoretic  formulation  is  that  the  probability 
distributions  of  all  random  variables  are  assumed  to  be  known  by  all  decisionmakers. 
Most  often,  this  is  not  the  case.  In  fact,  some  of  these  distributions  may  not  even 
exist.  Consequently,  approximate  forms  of  reasoning  have  been  developed,  such  as 
Zadeh’s  fuzzy  reasoning  and  possibility  theory  [Zade7/]  [Zade79],  and  the  Dempster- 
Shafer  theory  of  beliefs  [Shaf76]. 

Game  theory 

Decision  theory  is  really  a  special  case  of  the  earlier  theory  of  games,  first  intro¬ 
duced  by  Borel  [Bore2l],  developed  and  generalized  by  von  Neumann  [Neum28],  and 
then  appearing  in  the  definitive  work  by  von  Neumann  and  Morgenstern,  Theory  of 
Games  and  Economic  Behavior  [Neum47j.  Game  theory  focuses  on  the  problem  of 
dealing  with  adversaries,  and  predicting  their  actions  in  the  absence  of  communicated 
data.  The  structure  of  a  game  theoretic  problem  is  similar  to  that  of  a  decision 
theoretic  problem,  except  that  there  are  now  multiple  decisionmakers  (usually,  but  not 
necessarily,  two),  and  the  nature  of  possible  information  uncertainty  lies  in  the  inabil¬ 
ity  to  perfectly  predict  the  outcomes  of  moves  (i.e.,  of  decisions)  because  they  are 
governed  by  chance,  or  because  they  are  the  moves  of  other  decisionmakers,  which 
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cannot  be  anticipated  (a  constraint  imposed  by  the  structure  of  the  problem),  or  both. 
A  concise  treatment  of  game  theory  appears  in  [Blac54],  and  a  more  in-depth  survey 
and  analysis  appears  in  [Luce57], 

Team  decision  theory 

Team  decision  theory,  first  proposed  by  Marschak  as  an  outgrowth  of  organiza¬ 
tion  theory  [Mars55],  and  further  developed  by  Radner  [Radn62],  combines  game 
theory’s  notion  of  multiple  decisionmakers  and  decision  theory’s  notion  of  optimal 
decisionmaking  based  on  uncertain  information  and  utility.  Team  decision  theory 
stresses  the  distributed  nature  of  the  decisionmakers,  in  that  the  availability  of  infor¬ 
mation  is  governed  by  constraints  placed  on  their  modes  of  interactions  (e.g.,  limita¬ 
tions  on  communications  and  observations).  Decisionmakers  often  have  different  but 
correlated  information  about  underlying  system  dynamics.  Decisionmakers  are  part  of 
a  team:  they  are  usually  not  adversarial  (as  is  often,  but  not  always,  the  case  in  game 
theoretic  problems)  in  that  they  work  together  to  solve  a  single  problem.  Thus,  there 
is  a  need  for  coordinated  actions  to  realize  a  positive  payoff. 

Witsenhausen  [Wits68]  describes  a  team-theoretic  decision  problem  as  having  five 
ingredients: 

(1)  states  of  nature:  a  vector  of  random  variables; 

(2)  observations :  functions  of  the  states  of  nature; 

(3)  decision  variables:  functions  of  the  observations; 

(4)  strategy:  decision  rules  which  must  be  determined  by  design; 

(5)  loss  criterion:  a  function  of  decisions  and  states  of  nature. 

A  team  decision  problem  is  to  select  decisions  which  minimize  the  expected  loss 
over  the  possible  states  of  nature.  Consequently,  the  goal  is  to  find  strategies  for 
decisionmakers  which  carry  out  this  minimization  of  loss.  In  [Ho80],  Ho  presents  a 
concise  tutorial  on  the  team  decision  problem  with  examples  and  a  survey. 

Distributed  knowledge 

The  theories  presented  so  far  are  all  information  based:  decisions  and  their  qual¬ 
ity  are  based  on  the  value  of  information,  its  reliability,  and  its  availability.  But 
there  is  also  the  problem  of  how  distributed  agents  acquire  knowledge.  An  agent's 
knowledge  is  not  only  concerned  with  the  states  of  nature,  but,  because  of  the  multi¬ 
plicity  of  agents  and  because  they  do  not  share  a  common  memory  (i.e.,  they  are  dis¬ 
tributed),  it  is  also  concerned  with  what  other  agents  know.  Halpem  and  Moses 
explore  this  in  [Halp84],  where  they  develop  the  notion  of  knowledge  hierarchies  and 
common  knowledge  in  distributed  environments.  If  a  proposition  p  is  common 
knowledge,  then  p  is  known  by  all  agents,  and  all  agents  know  that  all  agents  know  p, 
and  all  agents  know  that  all  agents  know  that  all  agents  know  p,  and  so  on  ad 
infinitum.  They  show  that  common  knowledge  cannot  be  attained  solelj  through 
communication.  This  is  important,  as  we  will  show  that  a  perfect  decentralized 
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control  system,  one  where  all  agents  always  make  the  best  decisions,  cannot  be  built  if 
the  current  states  and  current  actions  of  all  agents  are  not  common  knowledge. 

2.2.  Decentralized  Control  Applications 

We  now'  consider  different  areas  of  research  concerned  wdth  decentralized  control 
problems.  Within  each  area,  we  will  discuss  a  number  of  research  studies  which 
influenced  our  research. 

Control-theoretic  techniques 

Research  on  control-theoretic  techniques  for  decentralized  control  have  focused 
on  optimization  based  on  decomposition.  A  problem  is  decomposed  into  subproblems 
with  interactions  modelled  by  interaction  variables.  In  general,  interaction  is  limited 
due  to  constrained  cooperation  (by  design)  amongst  the  individual  subproblem-solvers 
in  order  to  make  the  analysis  tractable. 

In  [Jarv75],  Jarvis  presents  an  informative  survey  of  adaptive  control  optimiza¬ 
tion  based  on  such  decomposition  techniques.  He  discusses  a  number  of  methods, 
including  gradient,  correlation,  random,  stochastic  automata,  fuzzy  automata,  pattern 
recognition,  and  mixed  strategies.  In  [Sand78],  Sandell,  Varaiya,  Athans  and  Safonov 
present  a  fairly  complete  survey  (up  to  1978)  of  decomposition  techniques  for  decen¬ 
tralized  and  hierarchical  control,  and  mathematical  methods  for  analyzing  large  scale 
systems.  For  a  tutorial  on  distributed  control  theory  based  on  decomposition  tech¬ 
niques,  see  [Lars79]. 

Many  of  the  algorithms  based  on  decomposition  techniques  sequentially  solve  a 
(possibly  infinite)  number  of  subproblems,  whose  solutions  are  expected  to  converge  to 
the  optimal  solution  for  the  main  problem.  In  [Cohe78],  Cohen  discusses  the  princi¬ 
ples  of  such  "decomposition-coordination  (two-level)  algorithms.' 

Another  problem  of  interest  is  that  of  data  fusion,  the  combining  of  data  from  a 
distributed  set  of  sources.  In  a  decentralized  control  system,  each  agent  may  be  both 
a  source  of  data  (which  is  sent  to  other  agents)  and  a  combiner  of  data  (which  is 
received  from  other  agents);  consequently,  data  fusion  potentially  occurs  at  each 
agent.  In  [Tenn81c],  Tenney  and  Sandell  present  an  extension  of  detection  theory 
(see  [VanT68])  to  problems  requiring  distributed  sensors,  which  has  applications  to 
data  fusion.  See  [Draz78]  for  another  data  fusion  application. 

Formal  models 

In  [Tenn81a],  Tenney  and  Sandell  present  a  formal  model  for  distributed 
decisionmaking  agents.  This  is  a  refinement  of  the  general  team  decision  model  found 
in  [Ho80],  in  that  they  transform  the  notion  of  state-space  into  one  that  is  more  distri¬ 
buted,  and  knowledge  of  agent  states  is  more  compartmentalized.  In  essence,  by  res¬ 
tricting  the  scope  of  each  decisionmaker’s  knowledge  of  the  underlying  system  dynam¬ 
ics,  the  search  for  optimality  becomes  more  tractable.  In  a  companion  paper 
[Tenn81b],  they  present  a  formal  model  for  distributed  decisionmaking  coordination 
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strategies  based  on  communication,  prediction,  and  abstraction.  They  analyze  how 
information  uncertainty  is  reduced  by  considering  various  types  of  communication 
and  different  organizational  structures. 

Casavant  and  Kuhl  present  a  formal  model  of  distributed  decisionmaking  in 
[Casa86],  based  on  graph  and  finite  automata  theory.  Their  model  is  novel  in  that 
they  explicitly  account  for  message-passing.  They  also  give  an  example  of  how  the 
model  can  be  applied  to  the  problem  of  load  balancing. 

Distributed  problem  solving 

Distributed  problem  solving  (DPS)  is  the  activity  of  multiple  agents  seeking  to 
solve  a  single  problem  in  a  cooperative  but  decentralized  fashion.  The  agents  are  gen¬ 
erally  high-performance  loosely-coupled  semiautonomous  computers.  The  focus  of 
DPS  research  has  been  on  developing  effective  methods  of  interaction  for  agents. 
Specifically,  this  includes  techniques  for  cooperation  and  coordination  through  selec¬ 
tive  information  sharing  with  limited  amounts  of  communication.  A  collection  of 
papers  describing  the  current  state  of  research  in  this  area  can  be  found  in  [Huhn87]. 

Early  work  in  DPS  research  was  done  by  Victor  Lesser  and  Lee  Erman  [Less78], 
which  led  to  their  model  for  distributed  interpretation  in  [LessSO].  We  will  review'  this 
work  in  the  next  section.  In  [Less8l],  Lesser  and  Corkill  describe  an  approach  for 
structuring  distributed  processing  systems  which  they  call  functionally 
accurate/cooperative.  Agents  in  these  systems  cooperate  and  function  effectively 
despite  inconsistent  views  of  state  information  due  to  distribution-caused  uncertainty  . 

In  related  research,  Corkill  and  Lesser  discuss  the  problem  of  attaining  global 
coherence  [Cork83]  through  meta-level  control  for  the  coordination  of  agents.  The 
network  organizational  structure  paradigm  is  presented,  wThich  specifies  the  control  of 
information  and  the  relationships  between  nodes.  Specifically,  this  paradigm  is  con¬ 
cerned  with  allocation  of  tasks  such  that  local  activity  within  agents  is  encouraged, 
and  the  need  for  inter-agent  communication  is  reduced.  In  [Less83],  Lesser  and 
Corkill  describe  an  application  of  these  techniques  to  the  problem  of  vehicle  monitor¬ 
ing  with  information  collected  by  a  geographically  distributed  network  of  sensor 

nodes. 

In  [Smit8l],  Smith  and  Davis  also  describe  frameworks  for  DPS  cooperation.  In 
particular,  they  discuss  the  contract  net  protocol  (see  also  [Smit80]  and  [Davi83]),  one 
of  the  major  cooperation  paradigms  developed  by  DPS  research.  In  the  contract  net 
paradigm,  agents  negotiate  and  make  contractual  agreements  over  tasks  to  be  carried 
out.  This  differs  significantly  from  other  forms  of  cooperation  in  that  agents  can  back 
out  of  the  negotiation  process  at  any  time,  rather  than,  say,  submitting  to  majority 
rule  as  in  voting  paradigms. 

In  [Stee86],  Steeb,  McArthur,  Cammarata,  Narain,  and  Giarla  discuss  coopera¬ 
tion  strategies  for  distributed  problem-solving  agents  by  analyzing  organizational  poli¬ 
cies  (task  decomposition  and  assignment),  and  information-distribution  policies  (the 
nature  of  inter-agent  communication).  They  present  a  framework  for  DPS,  and  its 


10 


Past  Related  Work 


Chap.  2 


application  to  the  problem  of  air  fleet  control.  See  also  [Camm83]  for  a  more  detailed 
discussion  of  cooperation  strategies  for  distributed  air-traffic  control. 

Genesereth,  Ginsberg,  and  Rosenschein  propose  in  [Gene85]  a  different  approach 
to  cooperation,  using  a  metaphor  for  communication  based  on  game  theory. 
Specifically,  agents  with  differing  goals  try  to  produce  better  results  than  other  agents 
working  on  the  same  problem.  In  this  approach,  agents  interact  less,  and  conflict 
dominates  their  activities. 

Finally,  in  [Gins87],  Ginsberg  identifies  the  conflict  between  agents  pursuing 
purely  local  optimizations  and  the  need  for  cooperation  and  coordination.  He  shows 
the  desirability  of  the  common  rationality  assumption  (equipping  agents  with  match¬ 
ing  decision  procedures).  Under  this  assumption,  one  can  precisely  characterize  the 
communication  needs  of  agents  in  the  context  of  cooperation  and  coordination. 

Job  scheduling 

We  end  this  section  with  a  discussion  of  decentralized  approaches  to  job  schedul¬ 
ing,  since  we  will  apply  in  Chapters  5  and  6  our  techniques  to  this  problem  to  demon¬ 
strate  their  feasibility.  Although  a  great  deal  of  research  has  been  carried  out  on  the 
job  scheduling  problem  (see  [Casa88]  for  a  taxonomy  and  a  survey  of  job  scheduling 
in  distributed  computer  systems),  we  focus  on  a  small  number  of  select  works  which 
were  relevant  to  our  research  in  that  they  based  their  solutions  on  decision-theoretic 
techniques  or  distributed  problem-solving  paradigms. 

In  [Chou82],  Chou  and  Abraham  propose  a  decision-theoretic  approach  to  the 
problem  of  assigning  tasks  to  processors  which  have  different  speed  and  reliability 
characteristics.  Although  their  approach  was  static  (i.e.,  it  did  not  respond  to  chang¬ 
ing  loads),  and  it  assumed  the  availability  of  a  priori  information  about  tasks  (which 
is  generally  not  the  case  in  practice),  it  was  the  first  to  make  explicit  use  of  decision 
theory  and  attain  good  theoretical  results. 

In  [Malo84],  Malone,  Fikes  and  Howard  considered  task  allocation  using  the  con¬ 
tract  net  protocol.  In  their  simulation  experiments,  they  used  the  metaphor  of  a 
marketplace  in  which  the  bids  represented  estimates  by  the  bidding  nodes  of  when 
they  could  complete  the  processing  of  a  task.  Although  they  also  had  to  assume  the 
availability  of  a  priori  information  about  task  execution  requirements,  they  were  able 
to  achieve  good  performance  with  low  communication  overhead. 

Stankovic  has  conducted  significant  research  in  decentralized  job  scheduling.  In 
[Stan84b],  he  presents  three  adaptive  algorithms  for  decentralized  scheduling  which 
assume  no  a  priori  knowledge  about  jobs.  His  simulation  results  show  that  decentral¬ 
ized  algorithms  exhibit  stable  behavior  and  improve  performance  at  modest  cost.  In 
[Stan85],  he  successfully  applies  Bayesian  decision  theory  to  job  scheduling.  We  will 
review  this  research  in  the  next  section. 
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2.S.  Review  of  Two  Significant  Past  Studies 

We  now  consider  two  studies  which  were  particularly  relevent  to  this  disserta¬ 
tion.  The  first,  by  Victor  R.  Lesser  and  Lee  D.  Erman,  describes  an  experiment  based 
on  a  model  for  distributed  interpretation  which  allows  for  inconsistent  and  incomplete 
views  of  the  global  system  state  [Less80].  The  second,  by  John  A.  Stankovic,  reports 
on  a  simulation  study  of  decentralized  job  scheduling  based  on  an  application  of 
Bayesian  decision  theory  [Stan85]. 

Distributed  Interpretation 

Lesser  and  Erman  [LessSO]  present  a  model  of  distributed  processing  which  expli¬ 
citly  deals  with  the  problems  of  distribution-caused  uncertainty  and  errors  in  control, 
data,  and  algorithm.  In  their  study,  they  apply  this  model  to  the  problem  of  distri¬ 
buted  interpretation.  They  define  an  interpretation  system  as  a  transformer  of  a  set  of 
signals  from  some  environment  to  higher  level  descriptions  of  objects  and  events  in  the 
environment.  A  distributed  interpretation  system  is  one  where  sensors  for  signal 
reception  are  widely  distributed,  interpretation  requires  data  from  multiple  sensors, 
and  must  operate  in  a  distributed  manner  since  communication  of  all  information  to  a 
centralized  interpreter  is  undesirable  (often  due  to  real-time  response  constraints,  lim¬ 
ited  communication  bandwidth,  and  reliability). 

Their  model  is  based  on  interpretation  techniques  found  in  knowledge-based 
artificial  intelligence  systems,  which  use  the  problem-solving  paradigm  of  solution 
search  through  the  incremental  aggregation  of  partial  solutions.  These  techniques 
handle,  as  an  integral  part  of  the  interpretation  process,  the  existence  of  uncertainty 
in  the  input  data  (e.g.,  due  to  noisy  channels),  and  the  possibility  of  incorrect  and 
incomplete  knowledge.  In  particular,  they  use  speech  understanding,  based  on  the 
Hearsay-II  system  [Erma80],  as  the  vehicle  for  experimentation  with  their  model. 

Their  distributed  system  consists  of  three  nodes,  each  of  which  is  a  functionally 
complete  Hearsay-II  system  with  access  to  one  segment  of  the  speech  data  of  an  utter¬ 
ance.  The  nodes  generate  a  single  unified  interpretation  of  an  utterance  by  communi¬ 
cating  and  resolving,  in  a  cooperative  and  competitive  fashion,  partial  tentative 
interpretations  based  on  their  local  views.  Each  node  includes  a  number  of 
knowledge-sources,  which  axe  independent  modules  containing  separate  areas  of 
knowledge,  such  as  acoustics,  phonetics,  syntax,  and  semantics.  .  A  hypothesize-and- 
test  problem-solving  paradigm  is  used  iteratively  to  arrive  at  partial  solutions.  Conse¬ 
quently,  control  is  decentralized:  it  is  asynchronous  and  data-directed,  and  synchroni¬ 
zation  is  obviated  by  the  self-correcting  nature  of  information  flow  between  knowledge 
sources.  Their  primary  goal  in  the  decomposition  of  the  problem  was  to  minimize 
internode  communication  relative  to  intranode  processing. 

The  results  of  this  research  are  significant  because  of  the  techniques  it  proposes, 
which  we  believe  can  be  used  as  general  structures  for  solving  decentralized  control 
problems.  These  include: 
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(1)  structuring  a  problem-solving  activity  as  a  distributed,  incremental,  asynchro¬ 
nous,  and  opportunistic  process,  in  that  solution  paths  are  investigated  in  the 
order  of  how  promising  they  are,  based  on  the  reliability  (degree  of  certainty)  of 

information; 

(2)  minimizing  internode  communication  by  exchanging  high  level,  abstract  informa¬ 
tion,  which  is  more  succinct  and  more  readily  usable  than  low-level  information 
(e.g.,  raw  inputs); 

(3)  arriving  at  the  solution  by  incremental  aggregation,  which  allows  the  system  to 
be  self-correcting  in  spite  of  incorrect  and  incomplete  information. 

Decentralized  Job  Scheduling 

In  this  work,  Stankovic  applies  a  heuristic  based  on  Bayesian  decision  theory  to 
decentralized  job  scheduling  [Stan85];  the  main  advantage  of  this  approach  is  that  the 
heuristic  dynamically  adapts  to  the  quality  of  state  information.  He  considers  a  rela¬ 
tively  small  distributed  system  of  five  nodes,  where  each  node  is  a  separate  decision¬ 
making  agent,  with  the  addition  of  two  monitor  nodes  (only  one  is  used,  the  other  is  a 
backup),  to  compute  the  information  necessary  for  the  heuristic. 

An  agent’s  decision  as  to  whether  jobs  should  be  offloaded  depends  on  its  view  of 
the  load  conditions  on  all  nodes.  An  agent’s  view  is  based  on  periodic  state  informa¬ 
tion  updates  from  all  other  agents.  Agents  periodically  send  state  updates  to  the 
monitor  node,  which  computes  probability  distributions  and  tables  of  maximizing 
actions,"  indicating  the  utility  of  all  possible  actions  for  all  possible  views.  These 
tables  are  periodically  sent  back  to  the  agents.  Consequently,  an  agent’s  job  schedul¬ 
ing  function  is  to  consult  its  table  of  maximizing  actions:  given  its  view  of  the  global 
state,  the  best  action  (the  one  with  the  maximal  utility)  is  selected. 

Simulation  experiments  were  carried  out  where  a  number  of  characteristics  and 
their  effects  on  performance  were  studied,  such  as:  the  state  information  update  period 
and  the  period  for  calculating  maximizing  actions;  the  mix  of  job  arrival  rates;  the 
delay  in  the  subnet;  the  job  scheduling  interval;  the  loss  of  monitor  nodes.  The 
results  of  these  simulation  experiments  were  analyzed  and  then  compared  with  results 
of  analytic  models  which  provided  theoretical  upper  and  lower  bounds  on  average  job 
response  time.  Stankovic  showed  that  his  heuristic  method  based  on  Bayesian  decision 
theory  not  only  performed  well  and  incurred  low  overhead,  but  was  robust  given  the 
presence  of  out-of-date  state  information.  The  main  limitation  of  this  approach  was 
in  the  centralization  of  the  probability  distribution  and  utility  computations:  although 
reliability  is  maintained  by  having  a  backup  monitor  node,  the  likelihood  of 
bottlenecks  occurring  on  communication  paths  leading  to  the  active  monitor  node 
increases  with  the  size  of  the  distributed  system. 

Observations  and  Summary 

We  observe  several  common  underlying  themes  in  both  approaches  to  the 
activity  of  multiple  agents  working  in  a  decentralized  fashion  cooperatively  to  solve  a 
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problem.  The  first  observation  is  the  importance  of  handling  uncertainty  of  informa¬ 
tion  as  an  integral  part  of  the  decisionmaking  process.  The  second  observation  is  the 
preference  for  using  imperfect  and  uncertain  information  in  decisionmaking  over  pos¬ 
sibly  better  information  which  could  only  be  acquired  at  a  significantly  higher  cost. 
The  third  observation  is  the  avoidance  of  explicit  synchronization  by  a  somewhat 
implicit  coordination  through  information  sharing.  Consequently,  errors  will  occur, 
but  they  are  tolerated  by  providing  for  self-correction. 

We  consider  the  ability  to  quantify  and  explicitly  account  for  the  goodness  of 
information  on  which  decisions  will  be  based  to  be  of  great  importance  in  decentral¬ 
ized  control  systems.  When  systems  become  very  large  (with  hundreds  or  thousands 
of  agents),  the  problem  of  communication  overhead  and  the  importance  of  agents 
relying  on  local  views  of  the  global  state  are  dramatically  accentuated.  Although  both 
of  the  studies  referred  to  above  considered  a  very  small  number  of  agents,  we 
hypothesize  that  the  three  observations  we  made  concerning  their  approaches  only 
become  more  relevant  when  larger  systems  are  considered.  One  of  our  goals  is  to 
extend  their  results  by  explicitly  considering  the  following  problems,  and  developing 
techniques  for  solving  them: 

(1)  how  decentralized  control  can  be  structured  in  large  distributed  systems; 

(2)  how  state  information  can  be  effectively  shared; 

(3)  how  not  only  the  value  of  information,  but  also  its  age,  can  be  incorporated  in 
decisionmaking. 


CHAPTER  3 


A  FORMAL  MODEL  FOR  DECENTRALIZED  CONTROL 


The  purpose  for  creating  and  presenting  a  formal  model  for  decentralized  control 
is  mainly  to  establish  a  language  for  describing  the  objects  of  interest  and  their  rela¬ 
tionships.  Our  goal  is  to  describe,  in  precise  terms,  what  the  limitations  are  in  decen¬ 
tralized  control  applications.  There  exist  other  models,  which  focus  on  different  levels 
of  abstraction,  and  have  correspondingly  different  goals  [H08O]  [Tenn85a]  [Tenn85b] 
[Casa86|.  This  model  was  devised  not  to  replace  these,  but  rather  to  focus  on  the  par¬ 
ticular  level  of  abstraction  of  interest  here.  In  fact,  this  model  freely  borrows  elements 
and  builds  on  the  ideas  developed  from  these  existing  models,  in  particular  from  that 
of  Tenney  and  Sandell  [Tenn85a]. 

3.1.  Model  Requirements 

The  model  should  fulfill  the  following  requirements. 

•  It  should  comprise  the  following  basic  objects  of  interest:  machines  which  carry 
out  work;  work,  to  be  carried  out;  and  information,  to  be  communicated  and 
used  in  making  decisions. 

•  It  should  allow  for  the  system’s  distributed  nature. 

•  It  should  capture  the  basic  notion  of  system  activity  over  time. 

•  It  should  allow  machines  to  be  both  autonomous,  having  direct  control  over 
themselves,  and  cooperative,  allowing  them  to  coordinate  their  activities. 

•  It  should  allow  for  the  quantification  of  preferences  for  the  various  courses  of 
action  a  machine  can  decide  to  take. 

•  It  should  provide  for  objectives  for  machines  to  attain. 

At  the  conclusion  of  this  chapter,  we  will  review  whether  these  requirements  are 
indeed  satisfied. 

3.2.  The  Formal  Model 

A  decentralized  control  system  can  be  modeled  as  a  directed  graph  G.  Nodes  N 
represent  agents,  and  links  L  represent  inter-agent  influences.  Formal  definitions  of 
these  elements  of  a  model  are  as  follows: 
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G  =  (N,  L)  N  =  set  of  nodes,  L  =  set  of  links 

N  =  {A,},  l^i^N  A,  =  agent  i,  A,  €  A, 

L  =  {Zji},  l^i^N  Zjt  =  influence  of  Aj  on  A, 

As  mentioned  briefly  in  Chapter  1,  a  distributed  computer  system  with  decentral¬ 
ized  resource  control  contains  the  following  objects:  agents,  resources,  and  work.  Each 
resource  belongs  to,  or  is  owned  by,  or  is  directly  controlled  by,  one  and  only  one 
agent.  With  no  loss  of  generality,  we  will  limit  each  agent  to  own  at  most  one 
resource  in  the  model.  Consequently,  an  agent  directly  controls  the  resource  it  owns, 
if  it  owns  any.  (Note  that  an  agent  may  indirectly  control  a  remote  resource  by  send¬ 
ing  requests  to  the  corresponding  remote  agent,  and  these  requests  may  be  accepted  or 
denied.)  An  agent  which  does  not  own  a  resource  is  limited  to  indirect  control  of 
some  other  resources.  Since  for  each  resource  there  exists  an  agent,  agents  may  sub¬ 
sume  the  activity  of  resources,  and  therefore  only  agents  are  explicitly  modeled. 
When  an  agent  wants  to  control  a  remote  resource  (indirectly),  we  will  simply  say 
that  the  agent  makes  a  request  to  the  agent  owning  that  remote  resource.  Resources 
will  no  longer  be  mentioned  explicitly.  We  first  consider  the  modeling  of  agents,  and 
later  that  of  inter-agent  influences. 

3.2.1.  The  Agent  Model 

The  model  of  an  agent  is  a  structure  with  eight  components  (i.e.,  an  8-tuple). 
The  first  component  is  the  agent’s  state, 

x,-(f)  €  X,-  —  {x^ ,  x,2,  }• 

The  agent’s  state  is  a  time- dependent  variable,  whose  values  are  chosen  from  a  local 
state  space  X,-.  The  value  of  an  agent’s  state  is  of  general  interest.  Remote  agents 
will  base  their  decisions  on  whether  to  make  requests  based  on  what  they  believe  this 
value  to  be. 

Each  agent  has  its  own  source  of  tasks  which  are  units  of  work  to  be  carried  out 
by  the  agent,  which  are  generated  and  submitted  (e.g.,  by  users): 

s,(t)  €  W  =  {w0,  w,,  w2,  •  •  ■ 

The  variable  s,(t)  is  the  locally  generated  work  arriving  at  the  agent  at  time  t,  and 
takes  on  values  from  the  system-wide  work  space  W.  Each  member  of  W  represents 
an  atomic  element  of  work.  There  is  a  special  symbol,  w0,  that  represents  the  null 
element  of  work.  If  at  time  t  no  work  arrives  at  agent  A,-,  then  «,•(<)  takes  on  the 
value  w0.  Since  this  work  is  locally  generated,  we  call  s,(t)  A,’s  generated  work. 

An  agent’s  generated  work  will  be  considered  as  a  stochastic  process  over  time. 
At  any  point  in  time,  the  value  taken  on  by  s,(f)  is  selected  from  the  distribution 
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defined  over  the  work  space  W.  In  general,  this  distribution  is  time-dependent. 

An  agent  is  affected  by  other  agents  through  what  are  called  influences.  Directed 
links  (discussed  below)  in  the  graph  G  represent  influences;  all  incoming  links  to  a 
node  represent  all  the  external  influences  directly  affecting  an  agent.  This  is  modeled 
explicitly  as  part  of  an  agent's  structure: 

Z,(t)  —  ^2 i(t),  •••,  Zj\/i (0)* 

The  variable  zjt{t)  is  the  influence  of  Ay  on  A,.  The  total  influences  on  agent  A,  are 
represented  by  the  vector  z,(t). 

Influences  come  in  two  varieties:  work  and  information.  A  work  influence  and  an 
information  influence  are  given  respectively  by  the  vectors. 

v)i(t)  =  (u>i,(0»  w2 «(0»  — >  «w(0); 

kj(t)  =  (fci,(t),  k2i{t),  ...,  kpn(t)). 

Agent  A,’s  work  influence  at  time  t  is  made  up  of  all  the  work  requests  (made  by 
remote  agents)  which  arrive  at  A,  at  time  t.  We  call  wflt)  A,’ s  transferred  work  at 
time  t.  The  information  influence  on  A,-  at  time  t  is  any  information  about  the  state 
of  the  system  originating  from  agents  in  A  and  present  in  A,  at  time  t.  We  call  k,(t) 
A,’s  information  concerning  the  system  state.  Note  that,  after  some  time  t ,  say,  t  +1, 
w,(t  +1)  consists  of  only  the  work  transferred  between  the  times  t  and  t+ 1  (or  of  null 
work  elements  if  no  work  is  transferred),  whereas  kflt -r l)  will  be  the  same  as  k,(t), 
unless  new  state  information  has  arrived  causing  it  to  change.  Thus,  kj(t)  is  per¬ 
sistent,  whereas  wflt)  is  not.  Precise  definitions  of  Zy,-,  try,',  and  Ay,  will  be  given  in  the 
next  section. 

A-  most  important  concern  about  agents  is  their  decisionmaking  capability.  At 
any  point  in  time,  an  agent  can  make  a  decision,  which  is  represented  by  the  decision 
variable 

d,(t)  £  D,'  —  {dg,d,j,  d,2 ,  }. 

Each  point  in  space  D,  is  a  distinct  decision  unique  to  agent  A„  except  for  d0,  which 
represents  the  null  decision,  and  is  common  to  every  agent  s  decision  space.  Decision¬ 
making  is  what  allows  an  agent  to  (partially)  control  future  values  of  its  state,  and,  to 
a  lesser  degree,  the  future  values  of  the  global  system  state. 

Each  agent  has  a  next  state  function  /,,  where 

*«•(*+ 1)  =  fi{x  »(*M«'(0)- 

The  next  state  is  a  function  of  the  current  state,  and  of  the  current  decision  an  agent 
makes. 
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Finally,  each  agent  has  a  decision  rule,  or  strategy, 


di{t)  =  m  MO,  «i(0)- 


An  agent’s  strategy  takes  into  account  influences  (transferred  work  and  state  informa¬ 
tion),  and  generated  work,  and  produces  a  decision.  Note  that  since  decisions  are  a 
function  of  influences  and  generated  work,  the  next  state  x,(t  +1)  may  be  thought  of 
as  a  function  of  the  current  state  x,(t),  of  the  current  influences  zt(t),  and  of  the 
currently  generated  work  s,(t).  Ultimately,  the  agent  decides  how  it  will  allow  itself 
to  be  affected  bv  influences  and  generated  work.  To  emphasize  this,  the  next  state  is 
defined  solely  as  a  function  of  the  current  state  and  of  the  current  decision. 


In  summary,  the  model  M,  of  agent  A,-  is  given  by  the  8-tuple: 


1.  x,'  €.  X.,  ,  x,2,  } 

2.  Si  6  w  =  {w0,  Wj,  w2,  •  •  •  } 

3.  P,w:WxN  -  [0,1] 

4.  w{  =  (wu,  w2{,  ...  ,  v>m) 
o.  ki  =  (kn,  k2{,  ■  •  •  ,  ^a'i) 

zi  =  ( z  1 1  >  •  •  •  i  zNi )  i  zji  —  (  wji  >  kji ) 

6.  d,  £  D,  =  {d0,dtl,  d,2,  *  *  } 

7.  XfxD,-  -  X, 

8.  7,-:  ((W,Xx), W)x  •  •  •  x((W,Xjv),W)  -  Dt- 


xt(t)  =  A,’s  state 
Si(t)  =  A,’s  generated  work 
P,w(«.(0i0>  work  distrib. 
w,(t)  =  A,’s  transferred  work 
=  A,’s  global  state  info. 
z,(t)  =  A’s  influence  on  A, 
d,(t)  =  A,’s  decision 
ft(x,(t),d,(t)),  next  state 
7,(2,(t),s,(t)),  decision  rule 


It  will  be  useful  to  consider  vectors  of  the  structures  presented  above  for  the 
entire  system,  i.e.,  for  all  agents.  This  will  be  done  by  simply  dropping  the  agent  s 
subscript.  For  example,  the  global  system  state  is  x(t)  =  (xj(t),  .  .  .  ,x_y(£)),  the  glo¬ 
bal  decision  is  d(t)  =  (dx(t),  .  .  .  ,d^(t)),  and  so  on. 


3.2.2.  The  Influence  Model 

The  links  of  graph  G,  labeled  Z;I,  represent  inter-agent  influences;  the  variable  z;, 
models  the  influence  of  Ay  on  A,.  As  previously  mentioned,  influences  are  of  two 
varieties:  work  and  information. 

Work  influence,  or  work  transferred  from  A;  to  A,  is  given  by  the  variable 

Wji(t)  €  W  =  {w0,  w,,  w2,  • '  ’  }• 

The  value  of  w;i(t)  is  a  point  in  the  space  W. 

Information  influence,  or  A,’s  information  concerning  Ay’s  state,  is  given  by  the 
variable 

hji ( t )  6  Xy  —  {xyj,Xy2,  }. 

The  value  of  Jfcy,-(f)  is  a  point  in  Ay’s  state  space  Xy.  Together,  wJt(t)  and  fcy,(t)  make 
up  Ay’s  influence  on  A,  at  time  t,  which  is  given  by  the  variable 
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zji{l)  _  ( w)\  (O’ 

What  determines  the  values  of  toy, -(f)  and  of  kp(t)?  Since  wp(t)  represents  a 
transfer  of  work  from  Aj  to  At,  Aj  must  have  made  a  decision,  some  time  before  t ,  for 
the  request  to  be  present  at  agent  A,  at  time  t.  This  relationship  between  the  arrival 
of  transferred  work  and  the  decision  which  caused  the  work  to  be  transferred  is  given 
by  the  work  function  gp , 

w ]i  ( 0  =  9ji[dj{T))i  t  <  t. 

The  time  t  will  generallv  depend  on  factors  such  as  the  transmission  time  between  A j 
and  A„  message  processing  delays,  and  so  on.  For  our  purposes,  this  value  r  need  not 
be  made  precise.  We  are  only  interested  in  the  fact  that  the  decision  by  Ay  must  be 
made  at  some  time  r  preceding  the  reception  of  the  transferred  work  toy,(f)  by  A,. 

Similarly,  A,’s  information  concerning  the  state  of  Ay  is  based  on  a  past  com¬ 
munication  by  Ay  to  A,',  decided  upon  by  Ay.  This  relationship  between  the  arrival  of 
information  and  the  decision  which  caused  the  information  to  be  sent  is  given  by  the 
information  function  hp, 

kji{t)  =  T  <  t. 

Again,  the  important  relationship  is  t  <  t.  In  particular,  during  the  time  internal 
(r,  t],  a  state  change  in  Ay  can  occur.  Thus,  fcy,(f)  may  not  accurately  reflect  the  state 
of  Ay  at  time  t.  We  will  explore  the  ramifications  of  this  shortly. 

Summarizing  the  influence  structures,  link  Zy,  in  L,  which  represents  the 
influence  of  agent  Ay  on  A,,  is  defined  by  the  following  4-tuple: 

1.  Wji  e  w  =  {w0,w1,w2,  •  •  •  }  Wji(t)  =  work  transferred  from  Ay  to  A, 

2.  kji  €  Xy  =  {x;i,xy2,  •  •  •  }  kji(t)  =  A.’s  state  information  about  Ay 

3.  gp:  Dj  -»  W  py,(dy(r)),  work  function,  r<t 

4.  h;t:  Dy  -  Xy  h/(dj(r)),  information  function,  ?<t 

3.3.  Utility  and  Objective 

All  agents  have  a  goal  or  objective.  It  is  useful  then  to  develop  a  notion  of  util¬ 
ity,  something  for  agents  to  maximize  as  their  objective.  Utility  is  defined  to  be  a 
real-valued  function  of  an  agent  A,’s  state, 

u,(x,(t))  G  R,  1  ^  i  ^  N 

The  utility  function  u,-(x,(f))  maps  each  state  of  A,  to  a  real  number.  The  magnitude 
of  a  state’s  utility  provides  a  measure  of  the  degree  of  preference  for  it  relative  to 
other  states. 

We  also  want  a  notion  of  global  utility,  which  we  denote  by 
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u(x(i))  €  R. 

u(x(t))  maps  each  global  state  x(0  €  X  to  a  real  number,  again  indicating  the  degree 
of  preference  of  x(<)  over  other  global  states. 

Utility  has  been  defined  as  a  measure  of  preference  of  a  state  at  a  single  point  in 
time.  Often,  it  is  more  interesting  to  consider  the  utility  of  a  particular  sequence  of 
states.  Therefore,  we  extend  the  utility  function  to  map  distinct  sequences  of  states  to 
real  numbers. 

A  sequence  of  states  for  A,  between  times  t  and  t  +r  is  denoted  by 

x,-(M  +  r)  =  (x,-(0.  *.(<+!)>  •  ■  •  >  *i{*+r))»  r>  °> 

and  denote  the  utility  of  xt(t,t+r)  by 

u,(xf(M+r)). 

The  utility  of  a  state  sequence  might  be  defined  by  a  simple  formula,  such  as  for 
instance 

T 

Ui{x{{t0,T))  =  £u1(x,(0). 
t‘t  o 

More  likely,  however,  it  will  be  some  complex  function  possibly  not  separable  as  in  the 
example  above,  of  all  the  states  in  the  sequence. 

This  extension  will  also  apply  to  global  utility.  The  utility  of  the  global  state 
sequence  x(t,  t+r)  is 

u(x(t,  t+r)). 

We  are  now  ready  to  consider  possible  objective  functions,  which,  like  utility 
functions,  can  be  defined  as  local  or  global.  The  local  objective  function  for  each 
agent  is 

find  dt  t  D,  which  maximizes 

j,{t+ 1)  =  E[u,(x;(t +1))]. 

Thus,  the  goal  of  each  agent  in  this  case  is  to  select  a  decision  which  will  maximize 
the  expected  local  utility  of  its  next  state.  It  is  an  expectation  because  an  agent’s  next 
state  is  a  function  of  a  random  variable,  the  generated  work  st-(i). 

For  local  state  sequences,  we  would  have 
find  dt  €  D,  which  maximizes 

J{{t+l,t+r)  =  E[u,-(x,-(<+l,t+r))],  r  ^  1. 

In  this  formulation,  an  agent  makes  a  decision  at  a  particular  point  in  time  which 
maximizes  the  expected  utility  of  states  for  an  interval  of  time  in  the  near  future. 
Note  that  during  this  interval,  the  agent  is  free  to  make  more  decisions.  Thus, 
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although  an  agent’s  decision  is  made  accounting  for  multiple  possible  future  states,  at 
each  future  state  the  agent  can  correct  for  unpredictable  events  (e.g.,  due  to  the  sto¬ 
chastic  nature  of  the  system)  by  making  more  decisions.  Consequently,  we  call  this 
objective  a  step-wise  probabilistic  optimization. 

For  the  global  state  case,  the  objective  is: 

find  d  e  D  which  maximizes 

J(f+1)  =  E[u(z(*+1))]. 

For  the  global  state  sequence  case,  the  objective  is: 

find  d  6  D  which  maximizes 

J(t  +l,t  +r)  =  E[u(z(f  +M+r))],  O  1. 

Some  assumptions  must  be  made  about  the  knowledge  and  activity  of  our  agents. 
The  first  is  that  the  only  items  of  information  that  agents  do  not  know  or  cannot  per¬ 
fectly  predict  are  those  based  on  the  non-deterministic  elements  of  the  system  model. 
For  example,  this  would  include  w'hat  new  work  from  the  agent  s  private  source  of 
work  would  appear  in  the  future.  Other  items  which  are  by  nature  static,  such  as  the 
agents’  state  spaces  Xl5  •  •  •  ,XN,  can  be  known  by  any  or  all  agents,  since,  it  would 
be  possible  to  endow  each  agent  with  such  information  at  the  beginning  of  time.  For 
example,  generated  work  s,-(f)  is  selected  from  the  distribution  P, w.  Although  an 
agent  cannot  predict  a  future  value  of  s,(f),  it  may  indeed  know  the  distribution(s)  of 
the  types  of  generated  work  arriving. 

Finally,  it  is  assumed  that  everything  that  can  be  known  at  the  beginning  of  time 
is  common  knowledge.  The  concept  of  common  knowledge  [Halp84]  is  defined  con¬ 
structively  as  follows.  Consider  a  propositional  modal  logic,  with  propositions,  formu¬ 
las  formed  from  propositions  closed  under  negation  and  conjunction,  and  modal 
operators  Ku  ■  •  •  ,KN.  The  formula  Kxp  has  the  semantics  "agent  A,  knows  proposi¬ 
tion  p."  Define  Ep  as  the  disjunction  K^p  A  A  KNp ,  meaning  "everybody 

knows  p."  Extend  this  definition  so  that  E1  p  =  Ep  and  El+l  p  =  EE  p.  EE  p  means 
"everybody  knows  E'p,"  thus,  E2 p  means  "everybody  knows  that  everybody  knows 
p."  Finally,  define  Cp  =  E1  p  V  E2 p  V  •  *  •  ,  meaning  "p  is  common  knowledge." 

These  assumptions  allow  us  specifically  to  focus  on  purely  cooperative  systems, 
which  acre  most  interesting  when  there  is  a  common  goal  to  be  achieved,  i.e.,  all  agents 
try  to  achieve  a  single  global  objective.  Consequently,  our  interests  lie  in  determining 
how  to  achieve  coordination  amongst  agents  in  the  best  practical  way  possible,  given 
their  willingness  to  cooperate. 

It  is  interesting  to  consider  the  theoretical  question  of  whether  the  global  objec¬ 
tive  of  maximizing  the  utility  of  the  next  state  or  next  state  sequence  can  be  achieved. 
Therefore  we  end  this  section  with  the  following  theorem. 
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Theorem:  The  global  objective,  "find  d  €  D  such  that  J(t+1)  is  maximized,"  cannot 
be  guaranteed  unless  x(t)  and  z(t)  are  common  knowledge. 

Proof:  Let  d*  =  (dj,  •  •  •  ,d^)  be  the  global  decision  which  maximizes  J{t+ 1).  If 
x(t)  and  z(t)  are  common  knowledge,  then  every  agent  A,  has  the  same  information 
to  determine  d  \  can  make  the  decision  dt ,  and  can  assume  correctly  that  every  other 
agent  will  do  the  same.  (If  there  is  more  than  one  possible  d  ,  then,  by  convention, 
the  first  d‘  computed  is  selected,  assuming  each  agent  tests  all  d£ D  in  the  same 
order.)  Of  course,  if  any  agent  A,  does  not  know  x(t)  or  z(t),  it  cannot  uniqueh 
determine  d'  (and  therefore,  cannot  determine  d,‘,  because  this  decision  depends  on 
the  decision  of  every  other  agent).  Note  that  the  theorem  says  that  x(t)  and  z(t ) 
must  not  only  be  known  by  all  agents,  but  that  x{t)  and  z[t)  must  be  common 
knowledge.  Consider  the  case  where  x(t')  and  E^  z(t)  are  true,  (i.e.,  everyone  knows 
that  everyone  knows  x(t)  and  z(t)  ),  but  Ekx{t)  and  Ekz(t ),  for  all  k  >  2,  are  false. 
An  agent  A,  might  say  of  some  other  agent  Ay:  "I  know  Ay  knows  x(t)  and  z{t ),  but 
if  Ay  does  not  know  that  I  know  this,  Ay  may  not  make  the  decision  I  expect;  I  must 
account  for  this  in  the  decision  that  I  make."  Since  A,-  cannot  count^on  Aj  making 
d j ,  it  may  be  statistically  better  for  A,  to  make  a  decision  other  than  d, . 

More  precisely,  let  M  ^  1  be  the  maximum  value  such  that 

[A]  ( Ekx(t )  V  Ekz[t))  is  true  for  all  k^M. 

If  M exists  (i.e.,  M  <  oo),  then  for  any  agent  At, 

[B]  KiKj{EM~l x(t) A  EM~1z(t ))  is  false  for  some  ;>t, 

but 

[C]  Kj(EM~1z(t)/\EM~1z(t))  is  true, 

where  [B]  and  [C]  follow  directly  from  [A].  (If  M=  1,  then  EM  ^(i)  and  EM  lz(t) 
simply  become  x(i)  and  z[t )  respectively,  in  both  statements  [B]  and  [C] .)  As  d,(t) 
and  dj(t)  are  mutually  dependent  (in  particular,  the  value  of  d,(f)  will  depend  on 
whether  dy(t)  equals  dy),  and  dy(t)  will  generally  depend  on  [C],  then,  since  A,  does 
not  know  [C]  (from  [B] ) ,  there  may  exist  a  decision  d, '  which  A,-  considers  statistically 
better  than  d\  (i.e.,  the  expected  consequences  of  d,  ( t )  =  d,  \  are  better  (the  expected 
future  state  utility  is  greater)  than  the  expected  consequences  of  d,-(t)  =  d, ,  over  all 
possible  decisions  dj(t ),  but  not  as  good  as  the  consequences  of  d,(t)  —  d,  and 
dj(t)  =  d‘).  But  if  di(t)  =  d, J(t+1)  is  not  maximized,  and  therefore  the  objective 

is  not  realized. 

Note  that  if  x(t )  and  z(t)  are  not  common  knowdedge,  ^but  are  known  by  all 
agents,  then  if  every  agent  A,  behaves  such  that  it  computes  d  and  simply  selects  the 
itk  component  as  its  decision  d,(t),  then  this  would  achieve  the  objective!  Where  is 
the  trick?  It  lies  in  the  implicit  assumption  that  agents  behave  in  the  manner  just 
described,  specifically,  that  they  do  not  try  to  second-guess  other  agents  but  that  they 
all  follow  the  same  procedure,  and  that  this  rational  behavior  is  common  knowledge. 
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Note  that  this  assumption  is  not  an  unreasonable  one;  it  could  easily  be  built  into 
every  agent’s  decision  procedure.  (This  assumption  is  similar  in  spirit  to  Ginsburg  s 
"common  rationality  assumption"  [Gins87].)  However,  it  still  requires  that  x{t)  and 
Z(t)  are  known  by  all  agents,  which  generally  is  an  unreasonable  assumption. 

3.4.  Theoretical  and  Practical  Limits 

In  the  previous  section  we  saw7  that  unless  the  current  global  state  x(f)  and  the 
current  global  influence  z(t )  are  common  knowledge,  (or  that  unless  z(t)  and  z(t)  are 
knowm  by  all  agents  whose  rational  behavior  is  common  knowledge),  agents  in  a 
decentralized  control  system  cannot  perfectly  achieve  a  global  objective.  It  is  most 
unlikely  that  in  a  real  system,  because  of  communication  delays  and  random 
processes,  either  the  current  global  state  or  the  current  global  influences  could  be  sim¬ 
ply  known  by  any  agent,  let  alone  be  common  knowledge.  This  is  why  the  problems 
posed  by  decentralized  control  systems  have  been  so  formidable.  And  yet,  there  must 
be  w'ays  of  structuring  such  systems  so  that  global  objectives  can  be  achieved  at  least 
near-optimally;  human  organizations  seem  to  be  able  to  do  this  (although  their  time 
constants  are  much  longer). 

To  explore  possible  techniques  for  solution,  we  must  clearly  understand  the  limits 
of  what  can  be  done  theoretically,  and  practically.  Let  us  start  with  some  general 
observations  about  the  model  and  its  implications  for  coordinating  agents. 

•  Observation  1:  Each  agent  has  direct  control  over  itself,  and  only  itself. 

An  agent’s  next  state  xflt  +1)  is  directly  affected  by  that  agent  s  decision  d,[t ) 
through  its  next-state  function  ft(xt(t),  dflt)).  Thus,  each  agent  takes  part  in 
determining  the  global  system  state  in  a  decentralized  control  system,  but  does 
that  by  local  actions  only. 

Conclusion  1:  If  a  global  objective  is  to  be  achieved,  coordination  of  agents  is  a 
necessary  condition. 

•  Observation  2:  Agents  can  allow  other  agents  to  influence  them. 

Although  an  agent’s  next  state  x,(f+l)  is  directly  affected  only  by  that  agent’s 
decision  dt(t),  that  agent  can  be  indirectly  affected  by  other  agents’  decisions. 
Each  agent  can  potentially  affect  remote  agent  states  through  communication. 
Conclusion  2:  Coordination  is  possible,  but  only  through  limited  communication. 

•  Observation  3:  Each  agent  cannot  predict,  with  complete  certainty,  its  future 
states. 

An  agent’s  decision  is  a  function  of  its  influences  and  its  generated  work, 
dflt)  =  7 ,-(z,-(f),  «,-(*)).  Since  sflt)  is  a  random  variable,  the  agent  cannot  predict 
what  its  future  actions,  and  therefore  what  its  future  states,  will  be.  (Of  course, 
an  agent  can  always  decide  to  ignore  its  inputs,  but  this  would  generally  go 
against  realizing  reasonable  objectives.) 

Conclusion  3:  Coordination  is  limited  by  random  events,  and  therefore  agents 
cannot  affect  the  global  state  deterministically. 
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•  Observation  4:  Each  agent  has  limited  indirect  control  over  other  agents. 

The  future  states  of  remote  agents  cannot  be  directly  affected  by  an  agent;  that 
agent  can  only  influence  remote  agents  by  communication. 

Conclusion  4:  Since  any  communication  is  not  instantaneous,  coordination  is 
limited  by  communication  delays,  limiting  the  control  that  agents  have  collec¬ 
tively  over  the  global  state. 

In  summary,  we  see  that,  to  achieve  a  global  objective  in  a  decentralized  control 
system,  coordination  is  a  necessary  condition,  and  is  possible  (to  a  limited  degree)  in 
our  model  through  communication.  But  there  are  limitations  to  coordinating  agents 
since  communication  is  delayed,  and  agents  cannot  predict  the  future  because  they 
must  respond  to  random  events. 

These  observations  and  conclusions  lead  to  what  we  call  the  two  fundamental 
problems  of  decentralized  control: 

1.  No  agent  can  know  with  certainty  the  current  global  state. 

2.  No  agent  can  know  with  certainty  the  current  actions  of  remote 
agents. 

Knowdedge  is  attained  either  by  computing  it  or  by  receiving  it.  (We  can  con¬ 
sider  innate  knowledge  to  be  received  at  the  beginning  of  time.)  Thus,  the  appropri¬ 
ate  question  is  whether  an  agent  can  obtain  certain  knowledge  about  the  global  sys¬ 
tem  state  and  remote  agent  actions  either  by  computation  or  by  communication. 

Problem  1  results  from  the  fact  that,  due  to  finite  communication  bandwidth,  the 
global  state  x(t )  cannot  be  communicated  to  all  agents  instantaneously.  And,  since 
inputs  to  the  system  are  stochastic,  the  global  state  x{t)  cannot  be  computed  exactly 
from  previous  information. 

Problem  2  results  from  the  fact  that,  due  to  finite  communication  bandwidth,  the 
global  influence  z(t)  cannot  be  communicated  to  all  agents  instantaneously.  And, 
since  inputs  to  the  system  are  stochastic,  the  global  influence  z{t)  cannot  be  computed 
exactly  from  previous  information. 

It  is  clear  that  Problem  1  is  indeed  a  fundamental  problem;  Problem  2  is  more 
subtle,  but  just  as  critical,  and  indeed  a  fundamental  problem  too.  In  fact,  Problem  2 
leads  to  the  corollary  that  agents  cannot  simply  optimize  local  objective  functions  and 
expect  a  global  objective  function  to  be  optimized.  This  is  best  illustrated  in  the  fol¬ 
lowing  example. 

Consider  the  decentralized  load  balancing  problem.  Agents  (e.g.,  computers) 
receive  jobs  from  outside  sources  and  must  determine  where  the  job  should  be  exe¬ 
cuted  so  that  the  average  time  a  job  spends  in  the  system  is  minimized.  Clearly,  a 
bad  situation  would  be  one  where  one  agent  has  much  work  to  do,  while  other  agents 
are  idle.  A  good  load  balancing  decision  rule  would  be  one  where  an  agent,  if  it  has 
many  jobs  pending,  offloads  new  jobs  to  less  loaded  agents.  Say  the  state  of  each 
agent  is  characterized  by  its  load,  the  amount  of  work  it  has  on  hand.  Let  us  now 
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assume  that  Problem  1  is  actually  not  a  problem:  all  agents  know  the  global  system 
state,  they  all  know  what  each  other  s  load  is.  Now,  if  each  agent  used  the  locally 
optimal  decision  rule  "offload  a  new  job  to  the  least  loaded  agent,  notice  what  may 
happen.  At  any  point  in  time,  there  is  one  least  loaded  agent  (assume  there  are  no 
ties).  It  is  possible  that  new  jobs  arrive  at  each  agent  simultaneously,  and  they  all 
decide  to  offload  to  the  least  loaded  agent,  whose  identity  they  all  know  perfectly. 
This  agent  will  therefore  get  swamped  with  work,  turning  what  were  locally  optimal 
decisions  into  a  global  disaster.  A  globally  optimal  decision  would  somehow  cause 
these  jobs  to  be  distributed  over  a  number  of  agents  so  that  no  agent  is  overloaded. 
This  is  difficult  to  do,  exactly  because  of  Problem  2,  since  agents  do  not  know  their 
counterparts’  current  decisions  because  of  the  unpredictability  of  inputs. 

There  are  some  difficult  practical  problems  also,  such  as  that  of  measuring  the 
utility  of  a  state  in  a  real  system.  Just  recognizing  its  own  current  state  may  be  a 
problem  for  an  agent,  not  to  speak  of  the  state  of  a  remote  agent.  Another  problem  is 
the  implementation  of  how  an  agent  selects  a  decision  which  even  approximately  can 
achieve  the  objective.  This  problem  is  aggravated  by  real-time  constraints,  the  time 
spent  deciding  is  a  cost  which  must  be  taken  into  consideration.  Ultimately,  the  best 
decision  rules  will  be  those  that  consider  the  tradeoff  between  decision  quality  and 
decisionmaking  cost. 

3.5.  Summary  of  Model  Requirements 

Let  us  review  the  requirements  fulfilled  by  the  model  we  have  introduced  in  the 
previous  sections. 

•  The  basic  objects  of  interest  modeled  are:  machines  (called  agents)  which  carry 
out  work;  work,  to  be  carried  out;  and  information,  to  be  communicated  and 
used  in  making  decisions. 

•  The  system’s  distributed  nature  is  modeled.  The  model  allows  for  a  set  of  control 
points  or  decisionmakers  (the  agents).  It  also  allows  for  coordination  through 
inter- agent  influences. 

•  The  basic  notion  of  system  activity  over  time  is  modeled.  Agents  progress 
through  a  sequence  of  states.  These  states  can  be  described,  and  agents  can  com¬ 
municate  their  states  to  each  other. 

•  Machines  are  modeled  as  both  autonomous  and  cooperative.  Each  agent  is  capa¬ 
ble  of  controlling  itself.  Although  it  has  direct  control  over  itself,  it  can  influence 
remote  agents.  Furthermore,  agents  cooperate  by  allowing  other  agents  to 
influence  them. 

•  Utility,  the  quantification  of  preferences  for  the  various  courses  of  action,  is 
modeled.  States  and  sequences  of  states  can  be  ordered  from  least  to  most  prefer¬ 
able,  and  the  degree  of  preference  of  each  can  be  described  quantitatively. 
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•  Objectives  for  agents  to  attain  are  modeled.  This  involves  maximizing  expected 
utility,  and  dealing  with  the  problems  of  delays  in  inter-agent  influencing,  i.e.,  in 
communication  of  information  and  transfer  of  work. 

The  model  focuses  solely  on  how  to  achieve  cooperation  efficiently,  assuming  a 
willingness  to  cooperate  amongst  agents.  We  note  again  that  it  does  not  include  the 
possibility  that  they  may  be  adversaries,  or  that  agents  behave  irrationally. 
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In  this  chapter,  we  present  a  number  of  principles  to  guide  the  design  of  large 
decentralized  control  systems,  and  we  describe  a  variety  of  general  techniques  (i.e., 
problem-solving  methods  based  on  our  design  principles),  many  from  different  discip¬ 
lines,  aimed  at  attacking  the  problems  posed  by  decentralized  control.  These  princi¬ 
ples  and  techniques  are  presented  in  broad  high-level  terms,  and  their  implementation 
will  depend  greatly  on  the  application  at  hand.  In  the  following  chapters,  we  look  at 
an  application  of  these  principles  and  techniques  to  a  relevant  decentralized  control 
problem  facing  operating  system  researchers  and  designers  today,  namely  that  of  load 
balancing. 

The  reader  should  be  convinced  by  now  that  decentralized  control  poses  formid¬ 
able  problems  where  the  search  for  "perfect"  solutions  may  simply  be  futile.  But,  of 
course,  this  should  not  stop  us  from  searching  for  approximate  solutions,  solutions 
which  come  close  to  achieving  what  perfect  solutions  would  provide.  After  all,  doing 
nothing  (e.g.,  opting  for  centralized  control  solely  because  decentralized  control  seems 
too  hard)  may  be  much  worse  than  adopting  an  approximate  solution  to  the  decen¬ 
tralized  control  problem.  The  central  point  of  this  research  is  that  we  can  do  better 
than  simply  giving  up.  The  question  is,  how? 

We  should  first  realize  that  other  disciplines  have  had  to  deal  with  similar  prob¬ 
lems.  Perhaps  something  could  be  learned  from  them.  It  should  be  encouraging  to 
observe  that  decentralized  control  systems,  such  as  those  where  the  agents  are 
humans,  work,  and  work  reasonably  well;  what  is  the  secret  of  their  success? 

Since  the  problems  mainly  stem  from  decisionmaking  with  incomplete  knowledge, 
the  following  disciplines  come  to  mind  as  potentially  helpful:  probability  theory,  sta¬ 
tistical  inference,  decision  theory,  game  theory,  and  a  number  of  subareas  wuthin 
artificial  intelligence  w7hich  are  mentioned  below7.  Probability  theory  teaches  us  how 
to  model  stochastic  behavior.  Statistical  inference  teaches  us  how  to  make  predictions 
about  future  events  using  knowledge  of  the  frequency  of  past  events,  and  of  patterns 
within  sequences  of  past  events.  Decision  theory,  along  with  utility  theory  or  the 
theory  of  preferences,  show’s  us  how  to  make  the  right  decisions  given  probabilistic 
information.  In  fact,  since  we  are  interested  in  multiple  decisionmakers,  the  subfield 
of  team  decision  theory  gives  us  insight  about  the  structure  and  properties  of  our 
problems.  Game  theory  tells  us  how  to  make  optimal  probabilistic  decisions  (espe¬ 
cially  where  optimal  deterministic  decisions  do  not  exist),  taking  into  account  multi¬ 
ple  decisionmakers. 
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Research  in  artificial  intelligence  has  also  given  us  a  wealth  of  knowledge  about 
using  heuristics,  dealing  with  uncertainty,  searching  large  solution  spaces  guided  by 
constraints  imposed  by  the  problem,  reasoning  about  distributed  knowledge,  reasoning 
about  the  activities  of  multiple  agents,  and  distributed  problem  solving.  The  develop¬ 
ment  of  these  techniques  has  been  driven  by  the  desire  to  solve  extremely  hard  prob¬ 
lems,  and  by  the  recognition  that  these  problems,  although  they  cannot  be  solved  per¬ 
fectly,  can  be  solved  in  some  near-optimal  fashion. 

Each  of  the  disciplines  discussed  above  have  had  an  impact  in  our  choice  of 
design  principles  and  the  problem-solving  techniques  based  on  these  principles.  In 
this  chapter,  we  will  propose  seven  principles: 

•  knowledge-based  solution; 

•  knowledge  abstraction; 

•  uncertainty  quantification; 

•  directional  heuristics; 

•  information  age  integration; 

•  frugal  communication; 

•  SPACE/TIME  randomization. 

The  first  seven  sections  of  this  chapter  are  devoted  to  these  design  principles.  In  the 
eighth  section,  we  present  a  framework  for  intelligent  agent  design  which  is  based  on 
our  principles.  Finally,  the  chapter  concludes  with  a  summary  section. 

4.1.  Knowledge-based  Solution 

The  first  principle  of  designing  large  decentralized  control  systems  is  to  construct 
a  knowledge-based  solution.  By  this,  we  mean  the  adoption  of  a  particular  philosophy, 
one  that  applies  multiple  pieces  of  case-specific  knowledge  rather  than  a  single  unified 
model,  to  solve  the  problem. 

In  general,  when  one  sets  out  to  solve  a  problem,  a  model  of  the  problem  is  for¬ 
mulated,  which  tries  to  capture  its  essential  features.  That  model  is  then  analyzed, 
manipulated,  and  eventually  "solved."  If  the  model  is  a  good  one,  the  results 
obtained  will  also  apply  to  the  original  problem,  which  in  effect  is  then  also  solved. 
Of  course,  this  assumes  that  a  model  can  be  developed,  but  this  is  not  the  always  the 
case.  Some  problems  are  so  difficult  that  we  cannot  reduce  them  to  single  sufficiently 
accurate  models.  This  is  generally  true  of  real-world  problems  where  control  is  decen¬ 
tralized  amongst  multiple  agents. 

Although  a  single  general  model  cannot  be  devised,  it  is  often  the  case  that  mul¬ 
tiple  special-case  models,  each  applying  to  a  specific  set  of  circumstances  the  agents 
would  find  themselves  in,  can  be  constructed.  If  such  a  set  of  special-case  models 
existed,  we  could  solve  the  problem,  perhaps  not  in  the  general  case,  but  certainly  in 
some  limited  set  of  cases.  This  set  of  special-case  models,  or,  more  generally,  this 
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case-specific  knowledge,  would  allow  us  to  get  at  least  an  approximate  solution,  which 
may  be  much  better  than  nothing  at  all.  In  fact,  this  is  the  idea  behind  expert  sys¬ 
tems  [Haye83],  which  have  been  very  successful  in  problem  domains  where  there  is  lit¬ 
tle  structure  and  no  single  concise  model  can  be  devised.  (In  the  case  of  expert  sys¬ 
tems,  case-specific  knowledge  is  often  in  the  form  of  rules,  under  the  circumstance  x, 
deduce  y.") 

In  our  case,  we  make  a  great  effort  to  identify  any  case-specific  knowledge  about 
the  specific  decentralized  control  problem  under  investigation,  and  then  use  that 
knowledge  to  make  better  control  decisions.  What  kind  of  case-specific  knowledge 
might  this  be? 

Consider  a  decentralized  control  system’s  activity  in  terms  of  state  sequence  reali¬ 
zations:  x{t),  *(<+l),  •  *  •  ,  x(f  +  T).  Such  a  sequence  would  characterize  the 
system’s  activity  in  the  time  interval  [t,t  —  T],  If  we  were  modeling  a  real  system,  we 
would  generally  expect  there  to  be  perceivable  dependencies  between  successive  states. 
For  example,  if  the  system  is  in  state  x(t)  =  x,  we  would  expect  the  probability  that 
the  next  state  is  x[t- fl)  to  depend  on  this  state  x.  Although  a  single  distribution 
which  does  not  change  over  time  may  not  exist  for  (x(£  +l)|  x{t)),  we  might  be  able 
to  identify  a  number  of  conditional  distributions,  each  one  applying  under  a  set  of 
special  conditions. 

If  the  system  can  recognize  these  special  conditions  and  therefore  know  which 
conditional  distribution  to  apply,  then  it  can  make  statistical  inferences  about  the 
current  state,  or  even  future  states,  given  the  knowledge  of  a  past  state.  Conse¬ 
quently,  knowledge  which  is  specific  to  a  particular  system  and  to  a  particular  situa¬ 
tion,  such  as  the  structure  of  time-domain  state  dependencies  indicated  above,  would 
be  used  to  make  educated  guesses  and  predictions  about  the  current  system  and  its 
situation. 

Case-specific  knowledge  of  the  structure  of  space- domain  as  well  as  time-domain 
state  dependencies  is  of  value.  Some  states  are  close  to  each  other  in  the  sense  that 
they  form  a  group  in  which  they  all  share  a  set  of  interesting  features  (e.g.,  their  utili¬ 
ties  are  nearly  the  same).  It  would  be  useful  if  such  groups  of  states  could  be 
identified.  This  knowledge  would  allow  the  system  to  abstract  a  large  number  of 
low-level  states,  perhaps  too  large  to  manage,  into  a  small  number  of  more  meaningful 
and  manageable  high-level  states.  This  reduction  could  have  a  great  impact  on  the 
efficiency  of  decisionmaking,  especially  in  decisions  which  are  based  on  what  possible 
states  the  system  is  in,  as  we  will  see  later.  This  reduction  would  also  have  a  great 
impact  on  communication  efficiency  since,  communicating  high-level  (instead  of  low- 
level)  state  knowledge  increases  the  bandwidth  of  information  flow,  leading  to  less  fre¬ 
quent  communication  of  messages. 

In  any  knowledge-based  solution,  there  is  always  the  question  about  how  the 
knowledge  is  represented.  There  are  three  types  of  knowledge  we  are  interested  in. 
states,  inter-state  relationships,  and  uncertainty.  We  must  answer  the  following 
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questions:  what  aspects  of  the  real  system  do  states  model,  and  how  are  they  encoded? 
What  do  inter-state  relationships  model,  and  how  are  they  encoded?  What  does 
uncertainty  about  states  and  their  relationships  mean,  and  how  is  it  modeled  and 
encoded? 

As  we  w'ill  see  in  the  next  section,  for  problems  of  decentralized  control,  states 
will  generally  represent  ranges  of  the  values  of  system  variables,  which  measure  some 
relevant  aspect  (e.g.,  the  amount  of  pending  work)  of  each  node.  They  may  be 
encoded  in  terms  of  a  statistic  about  these  state  variables  (e.g.,  the  average  value  of 
the  range),  or  as  abstract  symbols  representing  higher-level  concepts  (e.g.,  the  node  is 
overloaded  with  work).  Inter-state  relationships  can  represent  causal  properties 
between  states,  or  they  can  represent  properties  of  similarity  or  dissimilarity  between 
states.  These  relationships  can  often  be  conveniently  encoded  in  terms  of  rules,  such 
as 

if  the  current  state  is  xljfe  — *■  then  the  next  state  is  xt(. 

As  for  uncertainty,  it  is  the  topic  of  Section  4.3. 

4.2.  Knowledge  Abstraction 

The  second  design  principle  is  knowledge  abstraction,  and  is  concerned  with  how 
an  agent  organizes  knowledge  so  that  it  is  most  useful.  By  knowledge  abstraction,  an 
agent  transforms  low-level  information  into  higher-level  symbols  which  are  more 
directly  useful  to  the  agent.  The  low-level  information  is  derived  from  a  state  space 
as  given  by  the  model  in  Section  3.2, 

X, '  —  {x,j,X,2,  }• 

X,  is  the  state  space  of  agent  A ,.  Although  X,  is  a  local  state  space,  this  discussion 
applies  to  the  global  state  space  as  well. 

Each  state  in  the  low-level  state  space  identifies  a  possible  configuration  of  an 
agent  at  a  given  time  instant.  For  example,  if  the  agent  is  a  computer,  then  its  low- 
level  state  uniquely  identifies  a  possible  configuration  of  values  in  all  memory  loca¬ 
tions,  registers,  and  so  on.  Knowing  an  agent’s  state  at  this  level  of  detail  is  typically 
unnecessary,  and  most  likely  unmanageable  (as  the  possible  number  of  states  is 
extremely  large). 

Now  define  an  abstract  state  space  simply  as  a  set  of  symbols 

Y,  —  {y.-j  ,y  1 2  5  )  > 

where  each  element  of  Y,  is  a  non-empty  subset  of  X,-, 

y*  c  X„  (4.1) 

The  subscript  i  indicates  that  this  is  the  abstract  state  space  of  agent  A,-,  and  sub¬ 
script  k  enumerates  each  state  within  the  abstract  state  space.  In  general,  an  agent 
may  maintain  a  number  of  different  abstract  state  spaces  for  different  purposes.  We 
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will  denote  by  y,(f)  an  abstract  state  variable  taking  on  a  value  from  the  abstract 
state  space  Y,  at  time  t.  (By  convention,  variables  are  printed  in  italic  type,  and  con¬ 
stants  are  printed  in  boldface  type.) 

Defining  an  abstraction,  denoted  by  a  symbol  y,k,  of  a  number  of  low-level  states 
lx  x  x  ■  •  •  means  that  the  agent  finds  it  more  convenient  to  view  any  of 

l.  lp  ’  lq  ’  lr  ’  '  ’ 

these  low-level  states  in  terms  of  a  single  abstract  state  ylyt .  The  features  which  dis¬ 
tinguish  each  of  the  low-level  states  may  be  unnecessary  for  the  functions  of  the 
agent. 

Abstraction  allows  an  agent  to  compress  its  knowledge  into  essential  parts:  a 
small  number  of  parts  whose  differences  are  interesting  to  the  agent.  This  reduction 
in  the  number  of  objects  an  agent  has  to  keep  track  of  is  important  for  three  reasons. 

(1)  the  agent’s  knowledge  base,  the  store  where  it  keeps  its  knowledge  (e.g.,  its  data), 
and  its  rule  base,  the  store  w'here  it  keeps  its  rules  (e.g.,  its  code),  is  finite; 

(2)  the  time  to  search  the  knowledge  base  and  rule  base  increases  as  number  of 
objects  in  it  increases; 

(3)  communication  between  agents  can  take  place  at  the  right  level  of  abstraction, 
that  is,  at  whatever  the  agents  decide  is  most  useful  (e.g.,  most  efficient). 

Points  (1)  and  (2)  have  to  do  wfith  the  practical  space  and  time  constraints  in  the 
implementation  of  an  agent’s  knowledge  base  and  rule  base.  The  agent  s  knowledge 
base  is  the  store  where  associations  of  state  symbols  and  what  their  values  are  believed 
to  be,  are  kept.  The  agent’s  rule  base  is  the  store  where  associations  of  conditionals 
and  actions  are  kept.  The  conditionals  are  tests  about  whether  a  state  symbol  has  one 
of  a  number  of  given  values  (e.g.,  is  y,  6  {y,t,  y ,,,  yim}?)-  Very  simply,  the  less  items 
an  agent  has  to  manage,  the  less  space  it  will  need  to  store  these  items,  and  the  less 
time  it  will  use,  on  the  average,  in  searching  for  an  item.  Thus,  points  (1)  and  (2)  say 
that  abstraction  allows  agents  to  make  use  of  their  own  space  and  processing  time 
more  efficiently. 

Point  (3)  has  to  do  with  the  practical  time  constraints  in  the  implementation  of 
inter-agent  communication.  Time  spent  communicating  (e.g.,  constructing  a  message 
and  sending  it,  transmitting  it  over  a  network,  receiving  it,  and  interpreting  its  con¬ 
tents)  is  a  cost  an  agent  must  take  into  account.  Thus,  just  the  compression  of  infor¬ 
mation  due  to  abstraction  reduces  send,  transmit,  and  receive  times.  Also,  the  receiv¬ 
ing  agent  will  ultimately  use  the  information  in  some  way;  it  may  have  to  translate 
the  information  to  a  more  abstract  form.  If  the  information  was  already  communi¬ 
cated  in  the  abstract  form  in  which  it  will  be  used,  then  the  time  that  would  be  lost  in 
translating  it  does  not  have  to  be  consumed.  Often,  the  sending  agent  will  already 
have  the  information  needed  by  a  remote  agent  in  the  desired  abstract  form,  thus 
requiring  no  extra  abstracting  by  the  receiver. 

Practically  speaking,  an  agent  never  even  considers  the  low-level  state  space  X,, 
as  it  is  much  too  large,  and  its  level  of  detail  is  unnecessary.  An  agent  s  knowledge 
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base  and  rule  base  will  only  contain  abstract  states  of  current  interest.  In  theory, 
since  each  abstract  state  y,£ Y,  is  a  collection  of  low-level  states  (from  (4.1)),  there 
exists  some  mapping  of  (generally  many)  low-level  states  m  A',  to  each  yt.  \  et  it 
would  be  unreasonable  in  practice  to  implement  such  a  mapping  as  its  specification 

would  be  too  large. 

Ultimately,  an  agent  needs  to  identify  what  the  current  abstract  state  of  interest 
is.  Rather  than  implementing  a  mapping  of  low-level  states  to  abstract  states  to  do 
this,  the  agent  uses  an  indicator,  denoted  by  I{xt),  x,  €  Xt.  An  indicator  is  some 
readily  accessible  portion  of  the  low-level  state,  such  as  the  value  of  a  single  memory 
location  in  a  computer  or  a  small  set  of  instructions  which  compute  a  value,  which 
will  map  to  the  abstract  state  that  the  low-level  state  would  map  to: 

Ely  such  that  z,-  £  y,  — ►  ff(f(xi))  ~  ■>  xi  €  X,,  y,  £  Y, 

In  a  sense,  the  indicator  is  the  abstract  state  identifier. 

As  an  example,  an  indicator  I(xt)  may  be  a  memory  location  storing  a  single 
integer  which  is  a  function  of  the  machine's  state.  Depending  on  whether  this  integer 
is  above  or  below  a  threshold  T,  one  of  two  abstract  states,  y ,A|-,  or  y,io,  which 

comprise  the  state  space  Y,-,  is  the  current  abstract  state  of  the  machine.  Thus,  the 

agent’s  rule  base  would  include  the  following  two  rules. 

I(xt)  <  T  -*  y,  :=  y,,„ 

I(x,)  2  T  -  y,  :=  y,h, 

This  operation  classifies  the  low-level  state  information,  summarized  by  the  indicator, 
into  a  higher  level  of  abstraction,  so  that  it  is  more  convenient  to  use.  (For  instance, 
the  firing  conditions  of  other  rules  may  depend  solely  on  whether  I[xt)  is  above  or 
below  the  threshold,  and  not  on  its  actual  value.)  These  types  of  rules  could  be  con¬ 
strained  to  a  fixed  format  (e.g.,  check  if  the  state  is  •within  an  interval,  given  by  a 
lower  and  upper  bound),  and  would  then  be  compiled  for  efficiency. 

Note  that  the  existence  of  an  indicator  implies  that  the  agent  can  compute  the 

abstract  state  of  interest  and  encode  it  as  part  of  the  low-level  state.  For  some 

abstract  state  definitions,  however,  this  cannot  be  done  efficiently.  As  an  example, 
suppose  we  wanted  to  characterize  the  state  of  a  very  large  memory  holding  a  con¬ 
stantly  changing  number  of  objects,  which  in  general  are  uniformly  distributed 
throughout  the  memory.  Suppose  also  that  we  did  not  have  available  a  running  count 
of  the  number  of  objects  in  existence;  if  we  wTanted  to  know  exactly  how  many  objects 
were  in  the  memory  at  a  given  point  in  time,  we  would  have  to  count  them  at  that 
time.  Let  us  also  assume  that  the  memory  can  hold  a  maximum  of  2  million  objects, 
and  that  we  want  to  determine  whether  the  memory  is  more  than  half  filled,  or  less 
than  half  filled. 

Consider  an  indicator  which  computes  the  total  amount  of  used  memory  by 
counting  the  total  number  of  objects.  The  state  is  then  determined  by  comparing  the 
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total  count  to  the  number  1  million.  Counting  every  object  though  might  be  too  time 
consuming.  A  better  indicator  could  make  use  of  the  fact  that  the  objects  are  distri¬ 
buted  uniformly  about  memory,  and  therefore  count  the  number  of  objects  m  a  small 
area.  The  state  is  then  determined  by  comparing  the  count  to  half  the  number  of 
objects  that  could  fit  in  the  small  area.  From  a  practical  standpoint,  this  second  indi¬ 
cator  would  work  well.  But  it  would  not  have  the  property  of  always  indicating  the 
correct  abstract  state:  more  than  half  filled,  or  less  than  half  filled. 

What  can  we  say  about  the  second  indicator?  We  can  say  that  there  is  some 
probability  that  it  indicates  the  correct  abstract  state,  and  that,  if  the  probability  is 
high,  it  may  serve  our  needs.  In  the  decentralized  control  systems  of  interest  here, 
this  is  certainly  the  case.  Recall  that  we  want  agents  which  make  good  fast  decisions. 
Probabilistic  indicators  will  allow  bad  decisions  to  be  made  once  in  a  while,  but  these 
bad  decisions  might  be  tolerable  as  long  as  they  occur  infrequently  (unless  of  course 
they  could  never  be  tolerated,  such  as  when  human  lives  depend  on  them).  What  is 
important  is  that  these  indicators  allow  fast  determination  of  that  abstract  state. 

Since  agents  will  use  indicators  rather  than  low-level  states  to  determine  the 
abstract  state,  it  is  useful  to  view  the  relationships  between  indicator,  low-level  state, 
and  abstract  state  a  little  differently.  Let  us  say  that  the  correct  state  is  the  abstract 
state  y,  €  Y,  implied  by  indicator  J(xt).  Thus,  in  this  new  formulation  it  is  the  low- 
level  state  X{  for  which  there  is  a  probability  in  its  correspondence  to  y,. 

0  ^  P{x,  €  y.)  ^  1.  (4-2) 

Consequently,  we  call  Y,  a  probabilistic  abstraction  of  X,’.  This  formulation  of 
abstract  states  as  sets  of  which  low-level  states  have  a  probability  of  membership 
closely  parallels  Zadeh’s  notion  of  fuzzy  sets  [Zade65]  in  that  fuzzy  set  elements  have  a 
degree  of  membership.  The  formulations  differ  though  in  that,  unlike  the  relationship 
given  by  (4.2),  the  degree  of  membership  of  a  fuzzy  set  element  is  neither  random  nor 
statistical  in  nature.  This  is  not  to  say  that  fuzzy  set  theory  is  not  applicable  to  the 
principle  of  knowledge  abstraction:  on  the  contrary,  it  is  extremely  useful  in  the 
human  classification  of  states,  even  when  the  states  are  based  on  our  probabilistic 
indicators.  For  example,  a  probabilistic  indicator  of  the  busyness  of  a  computer  is  its 
CPU  job  queue  length.  A  human  may  describe  classes  of  busyness,  such  as  idle,  not- 
too-busy,  busy,  very-busy,  overloaded,  (which  collectively  one  might  call  a  fuz~y 
abstraction  of  the  low-level  state-space)  in  terms  of  fuzzy  sets  of  values  of  the  CPU 
job  queue  length. 

Probabilistic  and  fuzzy  abstractions  are  very  useful  in  systems  which  must  deal 
with  uncertainty,  as  is  the  case  of  decentralized  control  systems.  All  state  information 
an  agent  has  about  remote  agents  will  necessarily  be  uncertain  in  nature,  since  the 
agent  cannot  acquire  this  information  instantaneously.  The  relevant  question, 
though,  is  not  whether  the  information  is  true  or  false  (note  that  this  question  cannot 
be  answered  by  the  agent),  but  rather,  "to  what  degree  is  the  information  true?" 
Note  that  this  is  really  the  case  in  all  complex  decisionmaking  systems.  For  example, 
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in  a  computer  operating  system,  a  decision  is  never  made  based  on  exact  knowledge  of 
the  actual  low-level  state  (e.g.,  all  the  values  in  memory,  in  registers,  and  so  on),  but 
rather  on  some  indicator  (e.g.,  the  average  CPU  queue  length)  which  in  some  sense 
captures  an  interesting  feature  shared  by  a  group  of  low-level  states.  The  same  is  true 
in  economic  systems:  the  level  of  employment  may  be  considered  an  indicator  of  the 
health  of  the  economy.  Of  course,  an  increase  in  this  indicator  does  not  always  mean 
the  economy  is  growing,  and  vice-versa,  but  its  value  does  provide  a  rough  estimate  of 
the  growth.  It  is  useful  exactly  because  it  is  a  concise  piece  of  information,  but  has  a 
non-negligible  correlation  to  the  underlying  economic  growth. 

4.3.  Uncertainty  Quantification 

The  third  design  principle  is  uncertainty  quantification,  the  accounting  for  and 
the  quantification  of  underlying  system  uncertainties.  In  decentralized  control  sys¬ 
tems,  an  agent’s  uncertainty  about  the  system’s  state  and  about  actions  of  other 
agents  is  a  fundamental  problem,  as  we  have  seen.  The  main  points  we  made  in 
Chapter  3  are  that:  first,  uncertainty  exists;  second,  it  is  the  problem  at  the  root  of  the 
difficulties  in  building  decentralized  control  systems;  finally,  it  cannot  be  ignored,  it 
cannot  be  assumed  to  be  out  of  the  problem,  it  will  not  go  away.  Consequently  one  of 
our  major  focuses  is  on  defining  uncertainty,  on  how  to  quantify  it,  and  on  how  to  use 
it  to  make  good  control  decisions. 

First,  what  does  uncertainty  mean?  The  notion  of  uncertainty  characterizes  an 
agent’s  beliefs  about  propositions  (e.g.,  what  state  the  system  is  in)  dealing  with  itself 
and  its  environment.  Although  a  proposition,  as  an  entity  in  itself,  is  either  true  or 
false,  an  agent  may  believe  to  varying  degrees  that  it  is  true  or  false.  This  is  because 
what  a  belief  says  about  the  state  of  the  world,  and  what  the  state  of  the  w’orld  actu¬ 
ally  is,  need  not  be  the  same.  Quantifying  uncertainty  simply  means  defining  a  meas¬ 
ure  of  confidence  for  a  belief. 

Artificial  intelligence  researchers  have  recognized  the  value  of  qualifying  informa¬ 
tion  with  confidence  measures  ever  since  the  pioneering  work  of  MYCIN  [Shor76], 
which  used  certainty  factors  [Shor75]  to  express  uncertainty  of  propositions.  Today, 
there  are  many  other  methods  in  use:  Bayesian  probability  methods  [Pear86];  the 
Dempster-Shafer  theory  of  belief  functions  [ShafT 6] ;  fuzzy  logic  [Zade83];  other  multi¬ 
valued  logics  [Gain78].  These  methods  differ  in  how  uncertainty  is  represented,  such 
as  the  point  probabilities  used  by  Bayesian  methods,  the  intervals  of  uncertainty  used 
by  Dempster-Shafer  theory,  and  the  linguistic  truth-values  used  by  fuzzy  logic. 

We  will  argue  that  point  probabilities  meet  our  needs.  As  our  goal  is  to  find 
ways  of  building  decentralized  control  systems  w7here  agents  make  good  fast  decisions, 
accounting  for  uncertainty  will  help  in  making  good  decisions,  and  its  implementation 
will  determine  whether  fast  decisions  can  be  made.  Note  that  we  are  not  necessarily 
interested  in  the  development  of  a  representation  and  of  calculi  for  uncertainty;  how¬ 
ever,  it  is  the  realization  that  uncertainty  is  an  integral  part  of  decentralized  control 
decisionmaking  which  is  central  to  our  w'ork. 
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We  choose  to  represent  the  measure  of  an  agent’s  uncertainty  as  a  conditional 
probability  density  function  (cpdf)  over  the  space  of  possible  beliefs,  which  are  about 
the  states  of  remote  agents.  It  is  conditional  on  past  state  information,  and  on  how 
old  the  information  is: 

p(y,(0  I  Vi{t-T),T). 

This  expression  is  simply  the  probability  that  the  agent  s  abstract  state  is  y,(t),  given 
that  the  state  r  time  units  in  the  past  was  yt{t-r). 

For  notational  convenience,  we  will  denote  this  probability  by 

p{y,{t)  |  v.-(*-r)), 

leaving  out  the  second  r  since  the  parameter  t  —  t  implies  the  age  of  the  state  informa¬ 
tion,  as  long  as  the  reader  understands  that,  in  general,  this  probability  explicitly 
depends  also  on  the  age  of  the  information.  There  will  be  some  instances  where  the 
dependency  on  the  information’s  age  is  not  obvious;  in  those  cases,  it  wrill  be  written 
explicitly. 

An  agent  can  use  the  cpdf  to  make  decisions  which  generally  assume  knowledge 
of  the  current  state.  Knowledge  of  the  cpdf  also  allows  us  to  compute  the  expected 
utility  of  the  current  (and,  by  simple  extension,  of  the  future)  states: 

£[u(y,(0)  | 

Note  that  point  probabilities  fit  our  needs  for  several  reasons.  First,  probability 
theory  and  statistics  are  disciplines  which  are  well  established  and  well  understood. 
Second,  most  events  of  interest  in  a  distributed  system  occur  with  high  frequency. 
Most  of  the  cpdfs  we  will  use  can  be  built  by  an  expert  who  has  observed  the  system, 
or  by  the  system  after  observing  itself  in  real  time.  Although  frequency  data  may  not 
exist  for  all  points  (e.g.,  for  all  states),  we  have  observed  in  our  experiments  that,  if 
the  abstract  state  space  is  properly  defined,  the  resultant  cpdfs  have  enough  structure 
that  these  unknown  points  can  be  acquired  by  interpolation  (or  extrapolation)  of 
other  known  points.  The  cpdfs  can  then  be  stored  for  efficient  access  as  a  3- 
dimensional  array,  addressed  by  integers  k,  /,  and  m,  where  k  and  /  are  the  indexes  of 
Y,  selecting  ytJt  and  y1(,  and  m  €  {0,1,2,. ..,2V}  represents  an  interval  of  time 
(mT,  (m-t-l)T),  where  T  is  a  fixed  period  over  which  the  cpdf  changes  insignificantly, 
if  at  all.  Expectations  can  also  be  tabulated,  statically  if  the  cpdfs  are  assumed  not  to 
change,  or  whenever  the  cpdfs  are  modified.  (There  is  an  implicit  assumption  that,  if 
the  cpdfs  change,  they  change  slowly  in  time,  relative  to  period  T.  Ye  shall  address 
this  issue  shortly.) 

Since  one  of  our  goals  is  to  integrate  uncertainty  in  decisionmaking,  we  can  make 
use  of  decision  theory  and  utility  theory,  as  they  are  based  on  point  probabilities 
(which  is  another  advantage  of  using  them) .  A  central  question  we  have  to  answ  er 
relates  to  how  a  scheme  based  on  decision  theory  which  will  operate  efficiently  can  be 
implemented. 
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4.4.  Directional  Heuristics 

The  fourth  design  principle  is  the  reliance  on  directional  heuristics.  In  our  for¬ 
mal  description  of  decentralized  control,  we  defined  the  objective  function  in  terms  of 
the  utility  of  states.  An  agent  makes  the  decision  (a  component  of  the  global  deci¬ 
sion)  which  it  believes  will  maximize  the  expected  utility  of  the  next  global  state,  or  of 
the  next  sequence  of  global  states.  As  we  sawr  in  Section  3.3,  the  objective  can  be  for¬ 
mulated  as: 

find  d  6  D  which  maximizes 

J(t+l,r  +  r)  =  £[u(x(f+l,f+r))],  1. 

The  reason  for  maximizing  the  expected  utility  is  that  agents  do  not  generally  know 
the  next  global  state,  or  the  next  sequence  of  global  states,  but  presumably  they  know 
which  states  are  possible,  along  with  their  respective  probabilities.  Note  that  it  is  not 
enough  for  each  agent  to  know  the  conditional  probability  density  function  (cpdf)  of 
the  current  state  of  every  other  agent, 

p(z,(t)|  xt{t-T,)),  1  ^  i  ^  N, 

since  in  general,  state  transitions  of  multiple  agents  are  dependent.  Rather,  agents 
would  have  to  know  a  joint  cpdf  of  the  form 

p(xl(t),  x2(t),  •  •  •  ,  zy(0 1  x2(t-r2),...,  xN(t-rN)). 

Constructing  a  solution  which  requires  each  agent  to  know  this  joint  cpdf  would  not 
be  at  all  realistic.  In  fact,  the  existence  of  a  single  cpdf  valid  for  all  times  in  real  sys¬ 
tems  is  doubtful,  to  say  the  least.  And,  even  if  agents  had  access  to  such  a  function, 
computing  expected  values  over  the  global  state  space  for  a  number  of  points  in  time 
would  probably  take  an  excessively  long  time.  Agents  need  a  better  mechanism  for 
achieving  the  objective. 

In  dealing  with  such  problems,  one  approach  is  the  development  of  an  approxi¬ 
mate  solution,  one  w'hich  uses  a  directional  heuristic  to  guide  the  selection  of  a  deci¬ 
sion.  The  idea  is  simply  to  find  decisions  which  will  at  least  tend  to  increase  the 
expected  utility  of  states  in  the  near  future.  Thus,  our  very  ambitious  global  objec¬ 
tive  is  converted  to  something  more  reasonable,  like 

find  dx  £  Dj  such  that 

£[u(s/(f+r))]  >  E[u(y(t))],r>l. 

Note  that  the  states  of  interest  are  no  longer  low-level  global  states,  but  abstract  glo¬ 
bal  states,  and  also  that  we  only  care  about  the  positive  direction  of  change  in  the 
expected  utility  of  a  future  state.  This  is  a  common  technique  used  in  artificial  intelli¬ 
gence  applications,  and  is  often  referred  to  as  "hillclimbing"  [Wins84]. 

We  therefore  define  a  general  heuristic  for  approximating  the  objective  (in  terms 
still  not  accessible  to  agents,  but  this  will  be  corrected  with  further  development  of 


36 


Principles  and  Techniques 


Chap.  4 


these  ideas).  The  new  objective  can  be  summarized  as: 

find  7  e  T  that  minimizes  the  following  stepwise  loss  function: 

L  (t)  =  Ld{d{t),x{t))  +  Lc{Ak{t),x{t))  +  Le(^x(t),k(t))  +  Lr(s(t),x(t),d(t)) 

The  strategy  or  decision  rule  7  (w’hich  belongs  to  the  space  T  of  all  strategies)  will  be 
sensitive  to  four  types  of  loss,  given  by  the  four  terms  in  the  loss  function  L(t). 

Ld(d(t),x(t))  is  the  loss  due  to  decision  quality  degradation  of  d(t),  given  state 
x(t).  For  each  possible  value  of  x(f),  there  are  good  decisions  and  there  are  bad  ones. 
Ideally,  the  decision  rule  should  produce  a  good  decision  for  a  given  z(l).  Ld  is  a 
measure  of  how  far  off  the  decision  made  is  from  the  best  possible  decision,  due  to 
uncertainty  about  the  global  state  x[t).  Note  that  agents  will  never  base  their  deci¬ 
sions  on  z(t),  but  rather  on  an  abstract  state  y(i).  A  robust  decision  rule  wrill  still 
select  good  decisions  (i.e.,  decisions  which  are  close  to  the  optimal  one,  thus  making 
Ld  small)  even  though  what  is  believed  based  on  k(t),  and  what  is  true  (i.e., 
x(t)  €  y(t)  or  x(t)  $  y(t)),  may  be  different.  How  agents  infer  y(t )  from  k(t)  will 
determine  Ld. 

Le(Ak(t),x(t))  is  the  loss  due  to  communication  overhead,  given  the  change  in 
information  A k(t),  and  given  state  x(t).  When  there  is  no  communication,  i.e., 
A k(t)  =  0,  there  is  no  communication  overhead,  and  therefore  Lc  is  zero.  When  there 
is  communication,  there  will  be  a  change  in  information,  i.e.,  A k(t)  =  0,  and  Lc  will 
depend  on  how  much  communication  took  place  (the  magnitude  of  A k{t)),  and  under 
what  conditions  (the  value  of  x(t)).  There  is  a  subtle  tradeoff  between  Ld  and  Lc. 
decisions  depend  on  inter-agent  influences,  which  correspond  to  work  requests  and 
information.  If  Lc  is  minimized  by  not  communicating,  then  the  decision  function 
will  be  using  out-of-date  information,  potentially  causing  bad  decisions  and  increasing 
Ld.  Or,  Ld  could  be  minimized  by  making  sure  that  the  global  system  state  is  known 
v/ith  high  certainty  by  all  agents  through  a  great  deal  of  inter-agent  communication. 
Good  decisions  could  then  be  made,  but  the  communication  overhead  incurred  may 
be  unacceptable.  Clearly,  the  goal  is  to  find  the  amount  of  communication  between 
these  extremes  wrhich  allows  fairly  good  decisions  to  be  made  but  does  not  create  a 
great  deal  of  overhead.  We  w'ill  address  this  tradeoff  shortly. 

Le(~i,x{t),k{t))  is  the  loss  due  to  time  spent  evaluating  the  decision  rule.  There 
may  be  many  decision  rules  w'hich  provide  very  good  decisions,  but  these  decisions 
take  an  unreasonably  long  time  to  compute,  as  in  mathematical  programming  or 
exhaustive  search  solutions.  Le  is  a  function  of  the  decision  rule  (thus,  Lt  is  a  func¬ 
tional),  the  information  influence  k{t)  available  to  it,  and  the  state  z(<). 

Finally,  Lr{s  (t),x(t),d(t))  is  the  loss  due  to  random  effects  because  of  the  sto¬ 
chastic  nature  of  the  system.  The  quality  of  decisions  in  distributed  decisionmaking  is 
necessarily  limited  by  the  random  input  stream  of  generated  work  s(t),  since  this  ran¬ 
domness  makes  the  system  nondeterministic.  Consequently,  decisions  must  take  the 
unexpected  into  account;  these  decisions  are  ones  which  are  designed  to  work  well 
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under  many  situations,  but  are  not  as  good  as  decisions  based  on  certain  information. 

Note  that  the  first  two  terms,  Ld  and  Lc,  are  relatively  more  sensitive  to  system 
dynamics  (i.e.,  how  the  global  state  changes  over  time)  than  the  last  two  terms,  Lt 
and  Lr.  This  is  because  once  a  decision  rule  has  been  selected,  its  efficiency  can  be 
analyzed  under  worst-case  or  average-case  conditions,  and  Le  will  be  sufficiently  well 
characterized  to  allow  the  evaluator  to  know  whether  the  decision  rule  is  good  or  not. 
Also,  Lr  is  totally  a  function  of  the  stochastic  nature  of  the  system,  something  which 
cannot  really  be  controlled.  Consequently,  focus  is  placed  on  Lt  and  Lr  during  in  the 
design  phase  of  the  solution  (e.g.,  when  algorithmic  efficiency  is  evaluated;  when 
attempting  to  guard  against  improbable  but  possible  worst  case  situations;  and  so  on), 
while  Ld  and  Lc  will  play  a  more  active  role  in  the  agent’s  dynamic  decisionmaking 
activity. 

In  summary,  there  are  four  characteristics  for  a  good  decision  rule  7  which  seeks 
to  minimize  the  total  loss  L(t).  One  is  that  7  must  provide  a  good  decision  based  on 
reasonable  indicators  of  x(t).  Furthermore,  it  should  require  little  communication,  to 
be  traded  off  with  the  quality  of  information  needed  to  make  good  decisions.  It 
should  require  little  computation;  7  must  be  efficiently  computable.  And  it  should  be 
robust,  to  account  for  the  randomness  of  the  inputs. 

The  next  major  task  is  to  convert  this  notion  of  loss  into  functions  which  are 
computable  by  agents  using  the  limited  and  uncertain  knowledge  they  possess  about 
the  global  state.  Before  we  can  do  this,  we  must  consider  how  to  quantify  the  quality 
of  decisionmaking  based  on  aging  information.  This  is  the  subject  of  the  next  section. 

4.5.  Information  Age  Integration 

The  fifth  design  principle  is  information  age  integration.  How  does  an  agent 
make  the  "best"  decision  using  information  which  is  not  current?  An  agent’s  decision 
rule  produces  a  decision  based  on  the  agent’s  current  knowledge,  which  is  based  on 
communicated  information,  wrhich  in  general  will  be  old  since  communication  cannot 
occur  continuously  or  instantaneously.  This  simple  fact  tells  us  that  knowledge  which 
does  not  quickly  become  outdated  is  most  desirable  to  an  agent.  Thus,  in  designing  an 
agent’s  knowledge  space,  the  following  four  goals  should  be  achieved: 

(1)  given  a  decision  space  D,,  the  level  of  abstraction  of  states  should  be  chosen  as 
one  which  allows  efficient  selection  of  decisions  from  Dt; 

(2)  abstract  state  spaces  wdiere  states  change  slowly  in  time,  relative  to  inter-agent 
communication  delays,  should  be  sought; 

(3)  a  measure  of  an  agent’s  confidence  in  these  state  abstractions,  as  a  function  of 
their  "age,"  ought  to  be  developed; 

(4)  these  state  abstractions  and  their  confidence  measures  should  be  incorporated  as 
an  integral  part  of  decisionmaking. 
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How  to  achieve  these  goals  will  usually  require  a  careful  study  of  the  particular 
application  at  hand.  We  will  consider  each  goal  in  the  abstract  here,  and  look  at  a 
concrete  realization  of  these  goals  for  a  specific  application  in  the  following  chapters. 

4.5.1.  Choosing  the  Right  Abstract  State  Space 

First,  an  abstract  state  space,  appropriate  for  the  decision  space,  must  be  defined. 
Consider  the  decision  space 

D'  =  {dti  d2,  d3,  .  .  .  ,  dK). 

For  simplicity,  we  will  assume  that  every  agent  has  the  same  decision  space  D>. 
What  abstract  state  space  would  give  the  appropriate  level  of  state  differentiation  so 
that  the  selection  of  a  decision  is  meaningful  and  efficient?  In  the  previous  section  we 
discussed  the  notion  of  a  loss  due  to  decision  quality  degradation,  Lj(d(t),z(t)).  The 
loss  Ld  is  a  measure  of  the  distance  between  the  selected  decision  and  the  best  possible 
decision.  Given  x(£),  the  next-state  function  f(x(t),d(t)),  and  the  utility  function 
u(z(f)),  it  should  be  possible,  in  theory,  to  determine  for  every  state  what  the  best 
decision  is.  Thus,  we  could  partition  the  state  space  X  into  K  parts,  each  correspond¬ 
ing  to  a  decision  which  is  best  given  that  the  state  is  in  that  partition.  Figure  4.1 
shows  an  example  of  such  a  partitioning  where  K-5. 


Figure  4.1.  State  Space  Partition 
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In  general,  we  can  expect  the  state  space  X  to  be  extremely  large,  not  only  in 
absolute  terms,  but  also  relative  to  the  decision  space  D*.  Thus,  a  particular  decision 
may  potentially  be  the  best  for  many  low-level  states,  and  such  a  many-to-one  map¬ 
ping  would  eventually  have  to  be  implemented.  It  would  be  unreasonable  to  expect 
each  low-level  state  to  be  listed,  followed  by  the  decision  to  be  taken,  such  as  in  a  set 
of  rules  like 

xi  -  d5 
x3  -*■  d2 
x5  d4 
x7  -  dj 
x9  -  d3 
*11  -+  d3 

Even  if  implemented  with  a  data  structure  which  could  be  efficiently  searched, 
the  number  of  rules  would  take  up  too  much  space.  More  important,  agents  would 
never  deal  directly  with  low-level  states;  rather,  they  would  use  an  indicator  whose 
values  correspond  to  abstract  states.  Therefore,  what  is  the  proper  abstract  space? 

Very  simply,  the  proper  abstract  space  would  be  one  which  allowed  a 
differentiation  of  low'-level  states  similar  to  the  partition  imposed  by  the  decision 
space.  Thus,  abstract  space  Y  should  have  the  property  that,  if  x  -*  d,  and  x  6  y, 
then  V  z  €  y,  z  -*•  d,  where  i,:  £  X,  y  €  Y,  and  d€ D.  This  says  that  the  low-level 
states  which  make  up  a  single  abstract  state  should  all  imply  the  same  decision.  Oth¬ 
erwise,  the  abstract  space  does  not  differentiate  low-level  states  properly. 

It  would  be  ideal  if  such  an  abstract  space  could  systematically  be  constructed. 
In  practice,  however,  the  partitioning  of  the  low-level  state  space  must  be  studied, 
with  the  goal  of  finding  underlying  similarities  among  the  states  within  a  partition. 
An  indicator  w'hich  captures  these  similarities  must  then  be  found.  As  we  discussed  in 
Section  4.2,  a  perfect  indicator,  one  which  always  maps  the  low-level  state  to  the 
correct  abstract  state,  may  not  exist.  Fortunately,  our  requirements  are  soft  enough 
to  allow  indicators  that  have  a  high  probability  of  selecting  the  correct  abstract  state. 
In  the  end,  the  indicator  will  define  the  abstract  state  space  to  be  used.  It  will  be  a 
good  indicator  if  the  abstract  state  space  it  defines  reasonably  differentiates  low-level 
states  into  groups  which  reflect  the  partitioning  of  the  low-level  state  space  by  the 
decision  space. 

In  practice,  the  decision  space,  the  abstract  state  space,  and  the  indicator,  are 
often  not  difficult  to  design  since  our  main  focus  is  on  decentralized  resource  control 
problems.  The  decision  space  is  made  up  of  decisions  mainly  to  transfer  work  as  well 
as  information  between  agents,  in  such  a  way  as  to  obtain  a  higher  level  of  perfor¬ 
mance  than  if  no  work  or  information  transfers  were  allowed,  i.e.,  if  agents  were 
totally  isolated.  The  abstract  state  space  generally  characterizes  the  agent  s  (for  the 
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x6  -+  d4 
x8  -+  dj 
*io  d3 
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local  space)  or  the  system’s  (for  the  global  space)  capacity  to  do  work;  each  state 
represents  a  different  amount  of  available  capacity,  or,  from  the  opposite  viewpoint, 
pending  work.  An  indicator  is  simply  an  index  of  the  amount  of  pending  work,  such 
as  a  queue  length  or  the  utilization  factor  of  some  major  resource  controlled  by  the 

agent. 

4.5.2.  Abstract  State  Spaces  with  Slow  Transition  Rates 

Goal  (2)  in  Section  4.5  was  to  select  an  abstract  state  space  where  states  change 
slowly  in  time,  relative  to  inter-agent  communication  times.  Decentralized  control 
implies  affecting  a  system’s  global  activity  at  a  level  controllable  by  a  distributed  set 
of  agents.  An  agent  can  only  have  an  effect  on  a  remote  part  of  the  system  by  (impli¬ 
citly  or  explicitly)  communicating  something  to  it.  Thus,  the  speed  at  which  the  sys¬ 
tem  responds  to  control  commands  (which  are  the  consequences  of  decisions)  issued  by 
agents  is  constrained,  to  a  large  degree,  by  communication  times.  We  saw  in  the  last 
section  that  it  is  the  decision  space  that  drives  the  definition  of  the  abstract  state 
space.  The  whole  purpose  behind  obtaining  state  information  is  to  make  informed 
decisions.  Therefore,  if  states  change  rapidly  (he.,  multiple  times  during  a  single  com¬ 
munication  time  interval  between  agents),  then  the  level  of  activity  captured  by  the 
state  changes  is  too  detailed  for  adequate  decentralized  control. 

In  fact,  one  can  argue  that  states  must  change  slowly,  so  that  a  communication 
time  interval  is  a  fraction  of  the  time  between  state  transitions,  otherwise  the  com¬ 
munication  necessary  to  update  state  information  in  each  agent  would  generate  exces¬ 
sive  overhead.  Of  course,  if  we  can  take  advantage  of  state  transition  dependencies, 
so  that  an  inference  about  a  future  state  can  be  made  based  on  past  information,  this 
will  help.  But  it  will  also  help  if  we  can  design  the  abstract  state  space  in  such  a  way 
that  state  transitions  occur  as  slowly  as  possible,  while  maintaining  the  property  of 
differentiating  low-level  states  for  good  decisionmaking. 

One  design  guideline  that  follows  from  these  considerations  is  that  states  should 
be  selected  so  that  their  minimum  duration  is  larger  than  the  communication  time 
interval.  For  example,  consider  the  state  transition  sequence  in  Figure  4.2. 
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=  communication  time  interval 


Figure  4.2.  State  Transition  Sequence 


The  time  spent  in  states  y3  and  y4  is  significantly  smaller  than  the  communica¬ 
tion  time  interval.  Thus,  the  activity  which  they  represent  cannot  be  effectively  con¬ 
trolled  by  a  remote  agent,  nor  can  a  remote  agent  be  effectively  influenced  by  this 
level  of  activity.  One  solution  would  be  simply  to  disregard  them,  and  consider  the 
previous  state  still  to  be  in  effect.  This  is  similar  to  applying  a  low-pass  filter,  which 
selectively  removes  high  frequency  components  from  a  signal. 

As  previously  mentioned,  state  spaces  for  decentralized  control  applications  typi¬ 
cally  capture  the  notion  of  available  capacity  for  doing  work;  for  instance,  each  state 
might  correspond  to  a  different  degree  of  pending  work  (the  greater  the  amount  of 
pending  work,  the  less  the  available  capacity).  Say  this  degree  was  quantified  by  a 
real-valued  measure  Af(y,-)€ R  defined  on  the  state  space  Y,.  VVe  can  then  divide  time 
into  small  intervals,  of  equal  duration  T  (where  T  is  much  smaller  than  any  inter¬ 
state  transition  time),  sample  the  state  at  the  end  of  e'ach  such  time  interval,  and  pro¬ 
duce  a  time-series  of  real  numbers  derived  from  the  states.  Time-series  techniques 
equivalent  to  low-pass  filters  such  as  those  that  produce  moving-average  or  autoregres¬ 
sive  processes,  can  then  be  applied. 

The  A-order  moving  average  of  M(y,(t))  is  defined  as: 

m, {n)  =  Yj  uk'  M(y,((n-/:)-T)),  (4.3) 

k-0 

and  the  A-order  autoregression  on  M(y,(f))  as: 

N 

r,(n)  =  M(y,(n-T))  +  £  wk'  •  r,(n-k).  (4.4) 

k  =  l 

In  (4.4),  n  is  the  time  interval  index,  and  the  coefficients  u jj.,  ,  O^&^A,  are  con¬ 

stants.  m,(n)  or  r,-(n)  could  then  be  maintained  by  each  agent. 
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We  can  now  define  a  new  abstract  state  space  Y,',  which  corresponds  to  a  parti¬ 
tion  of  the  real  numbers  into  non-overlapping  intervals,  with  each  interval  centered 
about  the  points  defined  by  M(y,j,  for  each  y,  £  Y,.  The  abstract  state  implied  bj 
m^rz)  or  r,(n)  would  be  the  one  which  maps  to  the  interval  (defined  by  the  partition) 
containing  m,(n)  or  r,(n).  Thus,  this  abstract  state  space  \  has  a  close  correspon¬ 
dence  to  the  original  space  Y,,  and  yet  it  has  the  desirable  property  that  states  of 
small  duration  are  filtered  out. 

Two  final  points  are  in  order.  The  first  is  that,  in  practice,  the  measure  A7(y,j, 
y-  £  Y,  is  unnecessary;  the  indicator  7(x,j ,  x,  £  yt,  can  be  conveniently  used  in  the 
moving-average  or  autoregression  formula  instead.  The  second  point  is  to  realize  that 
Y, '  is  a  probabilistic  abstraction  of  X,.  As  long  as  the  agent’s  decision  rules  take  into 
account  the  probabilistic  nature  of  the  state  information,  good  decisions  can  be  made, 
and  communication  overhead  is  reduced. 

We  are  now  ready  to  explore  exactly  how  such  probabilistic  information  is 
accounted  for  in  decisionmaking.  This  will  allow  us  to  achieve  our  goal  of  integrating 
aging  information  in  the  decisionmaking  process. 

4.5.3.  Decisions  and  Utility 

We  have  concentrated  on  the  characteristics  of  the  abstract  state  space,  let  us 
now  focus  our  discussion  on  decisionmaking  and  utility.  In  decentralized  resource 
control  problems,  agent  decisions  have  to  do  with  transferring  work  and  information, 
and  the  abstract  states  have  to  do  with  an  agent’s  capacity  to  accept  work.  Consider 
a  decision  to  transfer  work.  By  this,  we  mean  that  an  agent  sends  to  another  agent  a 
request  to  do  work.  This  will  require  sending  a  message  to  communicate  the  request, 
along  with  any  data  necessary  to  carry  out  the  work.  The  type  of  message  and  data 
that  is  communicated  will  depend  on  the  application.  F or  instance,  in  network  rout¬ 
ing,  the  message  forwards  an  information  packet,  and  the  data  is  the  information 
packet  to  be  forwarded;  in  load  balancing,  the  message  is  to  execute  a  job,  and  the 
data  is  the  job  itself  (or  at  least  the  name  of  the  job,  assuming  the  remote  agent  has  a 
copy  of  it)  plus  associated  data  files. 

In  general,  we  must  consider  three  aspects  for  a  decision  to  transfer  work. 

(1)  when  to  transfer  work; 

(2)  what  work  to  transfer; 

(3)  to  which  agent  to  transfer  work. 

Consideration  (l)  is  driven  either  by  the  input  (e.g.,  the  arrival  of  a  packet,  or  a 
job),  or  by  a  perceived  change  in  conditions,  causing  an  agent,  which  has  pending 
work  not  transferred  in  the  past,  to  transfer  some  of  it  (e.g.,  store-and-forwarding  of 
packets,  or  job  migration).  In  the  first  case,  it  is  the  environment  which  causes  the 
triggering  of  the  work  transfer.  But  in  the  second  case,  it  is  the  agent  which  must 
detect  the  change  in  the  environment.  This  requires  a  decision  of  when  the  agent 
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should  observe  the  environment;  this  decision  should  be  judicious,  since  there  will 
most  likely  be  a  cost  in  observing. 

Consideration  (2)  is  something  the  agent  must  decide,  but  the  decision  about 
what  to  transfer  should  not  depend  on  the  states  of  remote  agents  (assuming  no  spe¬ 
cial  dependencies  between  agents  and  work).  In  fact,  this  consideration  depends 
mostly  on  the  application  at  hand  rather  than  on  peculiar  properties  or  requirements 
of  decentralized  control.  For  example,  in  load  balancing,  where  at  a  given  point  in 
time  there  are  potentially  many  jobs  within  an  overloaded  machine  to  select  for 
offloading,  the  job  to  offload  may  depend  on  the  job  characteristics,  like  the  expected 
remaining  execution  time,  and  not  on  remote  agent  characteristics. 

Consideration  (3),  to  which  agent  to  transfer  work,  is  a  decision  which  is  driven 
by  the  agents  themselves,  and  the  basis  for  the  decision,  unlike  (2),  should  depend  on 
the  states  of  remote  agents.  We  will  concentrate  first  on  the  selection  decision,  i.e.,  to 
which  agent  to  transfer  work.  In  Section  4.6,  we  will  consider  the  observation  deci¬ 
sion,  i.e.,  when  to  observe  the  states  of  remote  agents  (e.g.,  when  to  communicate 
with  remote  agents).  As  for  consideration  (2),  it  is  best  to  deal  with  this  in  the  next 
two  chapters  which  focus  on  a  load  balancing  application. 

How  do  we  construct  a  good  decision  rule  for  selecting  to  wrhich  agent  to  transfer 
work  (which  includes  the  possibility  of  keeping  the  work  locally),  given  past  state 
information  about  remote  agents?  This  state  information  will  generally  ha\e  the 
form: 

(yi(f-ai,),  y2(f-a2t),  .  .  .  ,  yN(t-aNi )) 

This  is  the  information  about  remote  agents  that  some  agent  A,-  will  have,  namely 
kj(t),  except  that,  rather  than  low-level  states  as  defined  in  the  model  of  Section  3.2, 
it  is  made  up  of  remote  agent  abstract  states.  The  value  ay,-,  represents  the 

age  of  the  information  about  agent  Ay’s  state,  known  by  A,.  (Maintaining  our  con¬ 
vention  that  the  ordering  of  subscripts,  in  this  case  j  followed  by  t,  corresponds  to  the 
direction  of  influence,  then  it  is  Ay  that  influences  A,  concerning  the  age  of  informa¬ 
tion  about  itself.)  ay,-  will  increase  with  time  until  a  communication  from  agent  Ay  is 
received,  at  which  time  ay,  is  set  to  the  transmission  time  between  the  remote  agent 
and  the  receiving  agent  (which  can  be  derived  either  from  the  timestamp  on  the  mes¬ 
sage,  i.e.,  the  time  the  message  was  transmitted  by  the  remote  agent,  assuming  syn¬ 
chronized  clocks,  or  simply  by  using  a  precomputed  expected  value).  Later,  we  will 
discuss  when  is  the  best  time  for  this  communication  to  take  place.  For  now,  let  us 
concentrate  solely  on  how  to  use  aging  state  information  in  the  remote  agent  selection 
decision. 

Let 

©  -  '  ‘  ‘  ?^io} 

be  the  agent’s  abstract  state  space,  where  the  higher  the  state  subscript,  the  greater 
the  degree  of  pending  work  the  agent  has  when  in  that  state.  State  8q  represents  the 
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state  where  there  is  no  work  pending,  and  state  01O  represents  the  state  where  there  is 
a  maximal  amount  of  work  pending  (and  therefore  the  agent  cannot  receive  any  addi¬ 
tional  work).  For  simplicity  of  exposition,  assume  a  homogeneous  set  of  agents,  so 
that  this  abstract  space  is  the  same  for  all  agents  (i.e.,  Y,  =  ©,  l^i^iV)- 

Given  this  abstract  state  space,  and  noting  that  each  agent  has  state  information 
with  different  ages  about  other  agents,  how  is  the  selection  decision  made?  The  agent 
which  must  select  where  to  transfer  work  could  simply  disregard  the  age  of  its  infor¬ 
mation,  and  select  the  agent  with  the  "best"  state,  the  one  corresponding  to  the  least 
amount  of  pending  work.  This  would,  of  course,  be  naive,  but  it  serves  the  purpose  of 
pointing  out  the  pitfall  of  not  taking  age  of  information  into  account.  The  problem  is 
that  the  state  information  about  a  remote  agent,  if  it  is  very  old,  may  have  no  bearing 
on  its  real  current  state,  leading  to  a  bad  decision  (i.e.,  the  utility  of  the  future  state 
goes  down  rather  than  up).  What  we  need  is  a  quantification  of  how  much  bearing 
past  information  has  on  the  current  situation. 

One  thing  that  could  be  done  is  to  provide  every  agent  A,-  with  the  set  of  condi¬ 
tional  probability  density  functions  (cpdf)  p(yy(t)  |  y^ft-ap)),  one  cpdf  per  value  of 
ocji,  for  a  large  range  of  (discrete)  values.  We  call  this  set  a  family  of  cpdfs.  (If  the 
system  of  agents  were  not  homogeneous,  an  agent  would  need  a  separate  family  of 
cpdfs  for  each  remote  agent.  Also,  assume  for  the  moment  that  the  cpdfs  depend  only 
on  age  ay,-,  and  not  on  the  time  t.)  If  an  agent  A,-  knew  a  past  state  value 
yy(f-ay,)  -  6k,  it  could  then  determine  the  probability  of  each  state  in  0  being  the 
current  state. 

At  this  point,  it  may  seem  reasonable  for  A ,  to  compute  the  mean  amount  of 
pending  work,  using  the  expected  value  of  M(yy(t))  given  j/y(t-ay,-): 

E[M{yj{t))  |  yj{t—a.ji)\  =  £  M{9)-p{yj{t)  =  0  \  y^t-arf) 

ecij 

A ,-  can  then  compute  E[M(yj(t))  |  yy(£— ay,)]  for  all  remote  agents  Ay,  and 

consider  the  agent  with  the  minimum  mean  amount  of  pending  work  to  be  the 
optimal  agent  for  transferring  work.  Unlike  the  previous  approach,  this  takes  aging 
information  into  account,  but  still  has  serious  problems  which  are  illustrated  by  the 
following  example. 

Say  the  family  of  cpdfs  p(yy(t)  |  yj(t  —  ap)=Os),  for  ay,  £  {0,  20,  50,  oc},  looks 
like  that  showrn  in  Figure  4.3. 
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ft  ft  0  ft  ft  ft  ft  ft  ft  ft  ft 


Figure  4.3.  Example  of  Condition  Probability  Density  Function 


To  ease  the  visualization  of  each  of  the  cpdfs,  they  are  shown  as  continuous  func¬ 
tions,  even  though  they  are  really  discrete  functions,  with  probability  mass  defined  by 
the  the  height  of  each  curve  at  the  points  6X,  .  .  .  ,0lQ.  The  cpdfs  illustrate  a  reason¬ 
able  hypothesis:  as  information  ages,  the  number  of  possible  states  the  remote  agent 
can  be  in  increases. 

For  the  purposes  of  our  example,  the  actual  values  of  the  probabilities  in  Figure 
4.3  are  not  important,  but  the  shape  of  each  curve  is.  In  particular,  as  the  informa¬ 
tion  age  increases,  the  probability  mass  in  the  cpdf  spreads  symmetrically  about 
2 <jj(t—OLji)  =  85.  Therefore,  computing  the  mean  amount  of  pending  work  produces 
M{8 5),  regardless  of  the  information’s  age  ay,.  Yet,  it  is  also  a  reasonable  hypothesis 
that,  say,  the  negative  consequences  of  transferring  work  to  a  remote  agent  Ay,  if  it  is 
in,  say,  state  01O  (which  means  it  already  has  reached  its  maximum  capacity  for  pend¬ 
ing  work),  outweigh  the  positive  consequences  if  Ay  is  in,  say,  state  0o.  Using  the 
mean  amount  of  pending  work  for  agent  comparison  ignores  this  asymmetry  in  state 
utility. 

What  we  need  is  a  measured  evaluation  of  the  consequences  of  transferring  work 
to  an  agent  Ay  whose  state  is  yyGYy.  We  call  this  real-valued  quantification  the  state 
utility  of  agent  Ay, 

uy(yy)  G  R,  yyeYy 

Utility  is  a  measure  (defined  on  the  agent's  state  space)  which  corresponds  to  the  per¬ 
formance  index  to  be  optimized  (e.g.,  average  response  time,  average  throughput). 

For  example,  Figure  4.4  illustrates  a  likely  state  utility  function  for  an  agent 
from  our  previous  example,  whose  state  space  is  ©. 
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Figure  4.4.  Example  of  State  Utility  Function 


The  utility  function  indicates  that,  as  the  state  number  increases  (i.e.,  as  the 
amount  of  pending  work  increases),  not  only  does  utility  decrease,  but  the  rate  of 
decrease  increases,  meaning  that  the  severity  of  the  negative  consequences  of  transfer¬ 
ring  work  to  an  agent  in  state  6 *  increases  with  k.  Thus,  transferring  work  to  an 
agent  Ay  whose  state  is  believed  to  be  6$  with  probability  1,  or  transferring  work  to 
another  agent  A*  whose  state  is  believed  to  be  either  0o  or  ^o,  each  with  probability 
.5,  are  two  very  different  options,  with  the  former  being  preferable.  The  concept  of 
state  utility  allows  us  to  encode  this  difference. 

Once  uy(yy)  is  defined,  we  are  still  left  with  the  problem  that  since  an  agent  A, 
does  not  know  with  certainty  the  state  of  agent  Ay,  it  cannot  Know  with  certainty  the 
state  utility  of  Ay.  But  now,  A,  can  use  the  family  of  cpdfs  to  compute  the  expected 
state  utility,  given  past  state  information, 

£[uy(yy(f))  |  yj[t—aji)\  =  £  Uj{9)  •  p(yy(0  =  9  |  yy (*-«*))  (4.5) 

06  Yy 

This  expectation  tells  us  two  important  things: 

(1)  how  desirable  remote  agents  are,  relative  to  each  other,  as  destinations  of  work  to 
be  transferred; 

(2)  how  the  value  of  information  changes  as  a  function  of  age. 

Item  (l)  is  related  to  the  selection  decision,  and  (2)  is  related  to  the  observation  deci¬ 
sion. 

We  are  now  ready  to  consider  the  question  of  how  E[uj[yj[t))  j  yy(f— ay,)] 
behaves  with  increasing  age  ay,-.  Let  us  continue  the  discussion  of  our  example,  with 
yy(t— cty,-)  =  #5.  Afterwards,  we  will  generalize.  First  consider  the  extreme  "values 
ay,-=0,  and  cty,-  approaching  00.  When  ay,-  =  0,  all  the  probability  mass  in  the  cpdf  is 
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at  65 ;  therefore,  using  formula  (4.5),  the  expected  utility  is 

u(05)  •  p{yj{t)  =  h  I  yj{t-ocjt)  =  e5)  =  u{eb). 

As  txji  approaches  oo,  the  probability  mass  in  the  cpdf  is  uniformly  spread  over  each 
possible  state.  Therefore,  using  (4.5)  again,  the  expected  utility  will  be 


Finally,  the  higher  the  value  of  a;i,  the  wider  the  even  spread  of  the  cpdf  about  state 
ff5'  Therefore,  we  can  think  of  the  cpdf  as  selecting  equal  parts  of  the  utility  function 
values  to  the  left  and  to  the  right  of  ^($5);  how  much  of  this  function  it  selects  grows 
with  ay,.  Since  u(ff5+jt)  decreases  more  rapidly  than  u(05_*)  increases  (with  k),  we 
must  conclude  that  the  expected  utility  decreases  when  the  age  of  state  information 
increases,  as  shown  in  Figure  4.5. 


Expected 

Utility 


Figure  4.5.  Expected  Utility  as  Information  Ages 


The  astute  reader  will  have  noticed  that  our  example  was  chosen  very  carefully: 
the  cpdf  was  defined  as  an  even  function  about  the  given  state  6 5.  Say  that  we  condi¬ 
tion  on  some  other  state,  and  that  the  cpdf  about  this  state  is  not  symmetric.  This 
may  lead  to  situations  where  the  expected  utility  does  increase  with  aging  informa¬ 
tion.  Could  it  possibly  make  sense  that  the  expected  utility  of  an  agent  s  current 
state  should  increase  with  the  aging  of  the  past  information  on  which  the  expectation 
is  based? 

Again,  let  us  consider  an  extreme  value,  but  this  time,  for  the  past  state  on 
which  the  expectation  is  conditioned  in  (4.5).  Suppose  that  agent  A,-  knows  that 
agent  Ay  is  currently  in  state  tfjo,  i.e.,  yy(t)  =  #io-  Then  the  current  expected  utility 
is  the  minimal  possible  utility.  Now  consider  what  happens  after  some  time  ay,-  goes 
by:  how  should  A,-  view  agent  Ay  state  yy(f +ay,)?  A,-  would  reason  that  at  worst, 
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j/y(f+o:y,')  is  equal  to  j /y(f)  =  $ich  But  it  is  also  possible  that  Ay  s  state  has  changed, 
and  since  there  are  no  worse  states  than  610,  it  could  have  only  changed  for  the  better. 
Therefore,  it  is  reasonable  that  the  expected  utility  of  Ay  s  state  has  increased  with 
time,  so  that  at  time  t  +  ap,  it  is  a  statistically  better  choice  for  A,  to  transfer  work  to 
A]  than  it  was  at  time  t. 

If  the  utility  function  has  a  shape  as  indicated  in  Figure  4.4,  and  the  family  of 
cpdfs  have  shapes  as  indicated  in  Figure  4.3,  the  shape  of  the  expected  utility  function 
based  on  (4.5)  is  illustrated  in  Figure  4.6. 


Expected 

Utility 


Age  ^ 

Figure  4.6.  Increasing  Expected  Utility  with  Age 


Note  that,  in  both  examples  of  the  behavior  of  the  expected  utility  as  the  age  of 
information  increases,  it  was  implicitly  assumed  that  the  cpdf  depended  on  age  a y,' , 
but  not  on  absolute  time  t.  But  it  would  be  unreasonable  to  expect  the  same  family 
of  cpdfs  to  model  the  state  transitions  of  a  real  system  accurately  for  all  times, 
although  perhaps  multiple  families  of  cpdfs,  each  being  an  accurate  model  for  different 
periods  of  time,  could  be  found.  If  an  agent  could  know  which  cpdf  family  is  the 
correct  model  for  any  period  of  time,  it  could  simply  use  the  techniques  presented  so 
far.  Of  course,  we  must  next  answer  the  question:  "how  does  an  agent  know  which 
cpdf  to  use?"  This  will  be  answered  in  the  next  section,  where  we  discuss  the  observa¬ 
tion  decision,  "when  should  an  agent  observe  the  system?" 

In  summary,  the  expected  utility  of  remote  agent  states  gives  an  agent,  who 
wants  to  transfer  work,  a  way  of  comparing  the  merits  of  remote  agents  as  possible 
destinations.  This  expected  utility  allowed  past  state  information  to  be  incorporated 
in  an  agent’s  decisionmaking.  Although  an  agent  may  be  uncertain  about  whether 
the  past  state  information  correctly  reflects  the  current  state,  by  quantifying  this 
uncertainty  using  conditional  probability  density  functions,  and  using  state  utility,  an 
informed  decision  can  be  made. 
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4.6.  Frugal  Communication 

To  observe  the  state  of  a  remote  agent,  an  agent  must  obtain  information  from  it 
through  direct  or  indirect  communication,  which  takes  time.  Further,  communication 
cannot  go  on  continuously  since  it  contributes  to  overhead.  Thus,  any  information  an 
agent  is  sent  about  the  state  of  a  remote  agent  will  experience  a  necessary  and  non¬ 
trivial  delay.  Delay  is  the  key  factor  contributing  to  an  agent’s  uncertainty  about  the 
global  system  state.  (Of  course,  there  may  be  other  factors,  such  as  noisy  channels, 
inaccurate  measurements,  and  so  on,  but  these  are  secondary  with  respect  to  delay. 
In  fact,  while  there  are  methods  such  as  error  correction  and  repetitive  sending  to 
solve  these  other  problems  which,  incidentally,  will  tend  to  increase  delay,  we  can 
never  completely  eliminate  delay.) 

The  question  then  becomes:  when  does  an  agent  communicate  state  information 
with  remote  agents,  i.e.,  when  does  an  agent  observe  the  system?  Clearly,  there  is  a 
tradeoff  between  communicating  too  often,  thereby  causing  a  great  deal  of  overhead, 
and  communicating  too  infrequently,  thereby  making  bad  decisions  due  to  out-of-date 
information.  We  shall  now  analyze  the  characteristics  of  this  tradeoff. 

4.6.1.  Informal  Analysis  of  Local  Loss 

For  illustrative  purposes,  let  us  focus  on  only  two  distinct  agents,  A,  and  Ay,  in 
the  distributed  system;  A,  is  the  observing  agent  or  the  observer,  and  Ay  is  the 
observed  agent.  The  observer  keeps  track  of  the  other  s  state  by  communication 
updates  and  by  inference  between  updates.  In  particular,  A,  keeps  track  of  the  last 
communicated  value  of  A^’s  state,  and  the  time  that  value  was  known  to  be  true.  (Ay 
can  send  the  time  it  recorded  its  state,  along  with  the  value,  and  for  simplicity,  we 
will  assume  that  both  agents’  clocks  are  synchronized).  When  we  speak  of  any  loss 
function,  it  is  of  a  function  computed  by  the  observer. 

Recall  from  Section  4.4  that  the  loss  due  to  degradation  in  decision  quality  was 
represented  by  the  function  Lrf(d(t),x(f)),  and  the  loss  due  to  communication  over¬ 
head  was  represented  by  the  function  Lc(Ak(t),z(t)).  These  are  global  functions: 
L,i(d(t),x(t))  is  the  global  loss  which  occurs  when  the  global  (or  collective)  decision 
d(t)  is  made  and  the  global  low-level  state  is  z(t );  Tc(A/:(t),x(f))  is  the  global  loss 
which  occurs  when  there  is  a  global  change  in  information  A k(t)  and  the  global  low- 
level  state  is  x(t).  For  now,  we  would  like  to  focus  our  attention  on  local  losses:  the 
losses  an  agent  experiences  directly,  due  to  degradation  in  the  quality  of  its  own  deci¬ 
sions,  and  for  the  overhead  it  incurs  due  to  its  own  communications.  Also,  rather 
than  using  the  low-level  state  as  a  parameter,  we  will  use  the  abstract  state.  The 
local  loss  functions  for  agent  A,  will  be  denoted  by  L^(dt(t ),  y,-(i)),  and 
Lc(Akt(t),  y,(f)).  Our  notation  implicitly  distinguishes  between  local  and  global  loss 
functions  by  whether  parameters  are  local  or  global  variables.  For  example, 
Ld{dt{t),y,(t))  is  a  local  loss  since  <f,-  and  y,  are  local  variables  of  A,,  whereas 
Ld(d(t),y(t)j  is  a  global  loss  since  d  and  y  are  global  variables.  (In  general, 
Ld(d,(t),y<(t))  and  Lc(Ak,(t),y,(t))  will  not  capture  the  total  local  loss  experienced  by 
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A,,  since  the  local  loss  functions  ignore  influences  by  other  agents.  We  will  have  to 
correct  for  this  later  when  we  generalize  our  analysis.  For  now,  we  will  simply  assume 
that  an  agent’s  local  losses  depend  only  on  its  local  variables.) 


These  local  losses  are  further  refined  by  analyzing  their  behavior  as  functions  of 
a;i,  the  age  of  A,’s  information  about  Ay,  and  T }l ,  the  period  of  communication 
between  A,  and  Ay  (which  can  vary  over  time).  Although  ay,  does  not  explicitly 
appear  as  a  parameter  of  Ld{d,{t),yt{t)),  it  is  the  primary  variable  affecting  the  qual¬ 
ity  of  decision  d,(f).  The  primary  variable  affecting  Afc,(f)  in  Xc(AA;f (f),y,(t))  is  the 
period  of  communication  Tp.  In  our  analysis,  we  will  explore  the  relationship 
between  Ty,  and  ay,-.  (We  order  the  subscripts  j  followed  by  i  in  Ty,  because  it  is  A^ 
that  provides  information  to,  and  therefore  influences,  A,  as  to  what  value  Tp  should 
have.) 

It  will  be  convenient  to  consider  different  representations  of  the  loss  functions  L d 
and  Lc,  with  their  explicit  parameter  being  either  ay,  or  7yf.  We  will  denote  the 
appropriate  representation  by  superscripting  either  Ld  or  Lc  with  either  (a)  or  ( T ). 
For  instance,  to  make  a  statement  about  decision  quality  loss  as  a  function  of  aging 
information,  we  will  use  the  notation  L(da){ctp).  Other  combinations  will  become  clear 
as  wre  proceed.  If  we  are  making  a  general  statement  about  decision  quality  loss  or 
communication  overhead  loss,  we  will  continue  simply  to  use  Ld  or  Lc.  (ISote  that, 
since  A,-  needs  to  keep  track  of  only  a  single  remote  agent,  namely  Ay,  ay,  and  Tp 
appear  in  the  loss  functions  as  scalar  values.  Later,  when  we  generalize  these  func¬ 
tions,  the  age  parameter  will  be  a  vector  a,-  representing  the  various  ages  of  A,  s  state 
information  about  all  other  agents,  as  in  the  global  loss  L\i  (<*,),  and  the  period 
parameter  will  be  a  vector  T,-  representing  the  various  periods  of  communication 
between  A,  and  all  other  agents,  as  in  the  global  loss  L[  ^(T,).) 


Ld  depends  inversely  on  the  quality  of  the  information  used  as  a  basis  for  a  deci¬ 
sion;  as  the  information  gets  better,  the  decisionmaking  gets  better,  and  the  loss  Ld 
goes  down.  Since  an  agent’s  information  is  about  the  past  states  of  remote  agents, 
and  this  information  is  used  to  predict  their  current  states,  we  can  say  that  the  qual¬ 


ity  of  information,  and  therefore  the  quality  of  decisionmaking,  decreases  monotoni- 
cally  with  the  age  of  the  information.  Thus,  the  loss  LSd  ^(ay,)  increases  monotoni- 
cally  with  ay,'.  L^°^(ay,')  should  eventually  flatten  as  ay,-  approaches  infinity,  as  the 
age  of  state  information  gets  very  large,  the  information  becomes  useless,  since  it  offers 
no  clue  about  the  current  state.  When  this  point  is  reached,  further  aging  implies  no 
difference  in  the  usefulness  (or  uselessness)  of  the  information. 


Therefore,  we  may  conclude  that  {a jt)  is  characterized  by  a  curve  with  the 


folio-wing  properties: 


(1)  it  is  positive  and  monotonically  increasing; 

(2)  its  first  derivative  asymptotically  approaches  zero. 

For  example,  we  expect  L(do)(ay,)  to  have  the  general  shape  shown  in  Figure  4.7. 
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Figure  4.7.  Decision  Quality  Loss  vs.  Information  Age 


Let  us  assume  that  A,  communicates  on  a  periodic  basis  with  Ay ,  i.e.,  that  there 
is  a  definite  inter-communication  period  between  agents.  Let  this  period  be  Xy,- . 
(Later,  we  will  relax  the  assumption  by  allowing  T ]X  to  vary.)  Given  periodic 
updates,  the  age  of  information  about  a  remote  agent  is  a  saw-toothed  function  of 
time,  age(t),  like  that  shown  in  Figure  4.8. 


Figure  4.8.  Age  of  Information  with  Periodic  Communication 


Tt,  the  transmission  time,  is  the  minimal  age  of  information  from  agent  Ay. 
Age(t)  increases  linearly  with  t  until  receipt  of  new  information,  replacing  the  old 
information,  causes  the  age  to  begin  at  Tt  again. 
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On  the  basis  of  the  assumption  we  made  about  how  the  decision  quality  loss 
function  varies  with  the  age  of  information,  we  can  now  determine  how 

L\i°\age(t ))  varies  in  time,  assuming  a  communication  period  of  length  Tp .  This  is 
illustrated  in  Figure  4.9. 


Figure  4.9.  Decision  Quality  Loss  with  Periodic  Communication 


The  quantity  Lt  in  Figure  4.9  identifies  the  minimum  decision  quality  loss.  Note 
that,  since  the  minimum  age  for  information  in  the  diagram  of  Figure  4.8  is  T t,  the 
decision  quality  loss  is  at  least 

Lt  =  xiQ)(age(n-r;V))  =  L(da)TO,  »e{0,l,2,...}. 

Thus,  there  will  always  be  some  positive  loss  since  information  is  not  received  instan¬ 
taneously. 

We  can  now  compute  the  decision  quality  loss  as  a  function  of  the  communica¬ 
tion  period  T;y,  which  is  a  time-average  of  lSd\a.ji),  given  by  the  formula 

T"  ■  ■  +  T  i 

L(P{TJt)  =  y-  J  L^\a)da.  (4.6) 


The  general  shape  of  this  function  is  similar  to  that  of  L^{a}l),  except  that  it 
increases  more  slowly.  The  reason  for  this  is  simple:  L^(Tji)  is  the  decision  quality 
loss  due  to  using  information  which  is  Ty,  time  units  old;  L\i  ( T/i )  is  the  average  loss 
in  the  quality  of  decisionmaking  due  to  the  use  of  information  which  is  between  T ^ 
and  Tp  +  Tt  units  old,  with  the  assumption  that  Tt  «  Tp.  This  assumption  is  rea¬ 
sonable:  the  transmission  time  between  two  agents  should  be  much  less  than  the  com¬ 
munication  period. 
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Now  let  us  consider  the  communication  overhead  loss,  Lc.  Communication  over¬ 
head  decreases  monotonically  with  the  period  TJt.  For  very  small  periods,  we  expect  a 
large  amount  of  overhead,  and  therefore  a  large  loss.  In  fact,  as  the  period  approaches 
zero,  the  loss  goes  to  infinity  since  there  will  be  no  time  to  do  useful  work.  As  the 
period  approaches  infinity,  the  rate  of  communication  goes  to  zero,  so  the  loss  should 
go  to  zero.  Therefore,  we  may  conclude  that  ( T yt)  is  characterized  by  a  curve 

with  the  following  properties: 

(1)  it  is  monotonically  decreasing; 

(2)  lim  L[T\t)  =  oo  and  lim  L[T\t)  =  0. 

r_  0  T-.00 

For  example,  we  expect  L[^( Ty,)  to  have  the  general  shape  shown  in  Figure 

4.10. 


(T) 


In  Figure  4.10,  we  have  identified  a  specific  value  for  the  period  7y,-,  T  jj ,  which 
we  refer  to  as  the  optimal  communication  period  between  the  two  agents  -4,  and  Ay. 
Recall  that  our  goal  is  to  determine  when  an  agent  should  observe  the  system.  To  do 
this,  we  need  to  explore  how  to  determine  T yf. 

We  have  two  expressions,  £^(2y,)  and  ( 2y,),  which  characterize  the  deci¬ 
sion  quality  loss  as  a  function  of  period  Ty,-,  and  the  loss  due  to  communication  over¬ 
head  as  a  function  of  Ty, ,  respectively.  The  sum  of  these  functions  gives  the  total  loss 
due  to  degradation  in  decision  quality  and  to  communication  overhead.  The  goal  is 
then  to  find  the  minimum  of  this  sum;  the  corresponding  period  T y,-  is  the  optimal 
communication  period. 

What  insights  about  the  existence  of  Ty,-  can  we  can  draw  from  an  informal 
analysis  of  the  general  shapes  of  ( T Jt)  and  l\  ^(Ty,-)?  Since  we  are  looking  to 
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minimize  L[T){T,i)  +  L[T)(T}1),  we  need  to  solve  the  equation, 

-i-fi'/Hr,.)  +£'/,(2',.)l  =0.  (4.7) 

If  this  equation  has  a  solution  (i.e.,  there  exists  a  value  for  T;t  for  which  this  equation 
is  true),  that  would  constitute  a  minimum  point  for  L\T){Tp)  +  L(d  ’{Tp).  If  there 
are  multiple  solutions  (multiple  local  minima),  we  want  the  one  which  produces  the 
global  minimum  value.  To  simplify  our  discussion,  we  will  assume  that  there  is  only 
one  minimum  point,  and  therefore,  there  is  a  single  solution  to  (4.7) .  (Note^that  if 
(4.7)  has  a  solution,  it  must  be  at  a  minimum  point,  as  the  sum  Lc  '{Tp)  +  Ld  {TJt) 
cannot  have  a  maximum  point.  This  is  because  L{  (Tyt)  is  infinite  when  Tp  is  0.) 

Let  us  consider  the  conditions  for  which  (4.7)  will  or  will  not  have  a  solution. 
Consider  the  first  case: 


dL{P(T}t) 

dTj{ 

> 

dTJt 

dL[T)(Tji ) 

dL<jP(Tji) 

dT jx 

N 

dTji 

In  this  case,  when  the  communication  period  is  below  some  threshold  r,  the  rate  at 
which  communication  overhead  loss  decreases  is  greater  than  the  rate  at  which  deci¬ 
sion  quality  loss  increases.  Above  this  threshold,  the  opposite  is ^ true.  Under  these 
conditions,  L{cT){t)  +  L(/)(r)  is  a  minimum  value,  and  therefore  Tp  =  t.  This  is  illus¬ 
trated  in  Figure  4.11. 


Figure  4.11.  Case  1:  Sum  of  Losses  with  Minimum  Point 
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Summarizing  case  1,  since  there  is  a  global  minimum  point  for  the  total  loss,  the 
period  7y,  is  the  optimal  communication  period  between  At  and  Ay  That  is  the 
point  where  the  tradeoff  of  degradation  m  decision  quality  due  to  aging  information 
and  overhead  due  to  communication  is  optimized. 

Now  consider  the  second  case:  the  sum  does  not  have  a  minimum  point,  as  shown 
in  Figure  4.12. 


Figure  4.12.  Case  2:  Sum  of  Losses  with  no  Minimum  Point 


This  condition  implies  that,  the  longer  the  period  of  communication  Ty,,  the 
smaller  the  loss  incurred  by  an  agent.  Therefore,  in  this  case,  it  is  simply  better  not 
to  communicate  at  all! 

Case  2  arises  in  distributed  environments  where  states  change  rapidly  relative  to 
communication  time.  By  the  time  an  agent  receives  communicated  state  information 
from  a  remote  agent,  that  information  is  useless  since  the  state  has  changed  many 
times  during  the  transmission.  The  new  state  may  depend  very  little  on  the  past  state 
which  was  communicated  -  In  such  a  case,  it  makes  sense  not  to  communicate  at  all, 
since  communication  provides  no  useful  information  but  does  add  overhead.  Agents 
should  instead  base  decisions  on  the  limiting  probability  distributions  of  remote  agent 
states,  assuming  that  they  exist  and  are  known  (e.g.,  the  system’s  steady  state 
behavior  can  be  modeled  and  analyzed). 

* 

The  more  interesting  situation  is  case  1,  where  there  is  some  optimal  period  T ;v 
for  communication.  Note  that  Ty,-  may  vary,  depending  on  how  the  loss  functions 
vary  with  time.  Thus,  for  any  decentralized  control  application,  these  loss  functions 
must  be  determined  so  that  the  communication  period  T ji  can  be  computed  dynami¬ 
cally. 
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We  make  the  final  observation  that  L^(Tjj)  is  a  slow  changing  function,  as  it  is 
a  time-smoothed  average  of  (07,).  Thus,  the  sensitivity  of  the  optimal  communi¬ 
cation  period 

Ty,  =  min[l//*  (  T;1)  +  L[T^{  Tp)] 

to  variations  in  i'/’dV.)  over  relatively  short  time  intervals  is  small.  The  point  is 
that  we  cjj.n  approximate  and  know  that,  as  long  as  the  best  communication 

period  T  ,,  as  produced  by  this  approximation,  is  close  to  the,  theoretical  optimal 
value  Tji,  the  actual  difference  between  the  loss  using  period  T Jt  and  the  loss  using 
period  T;y,  will  be  small. 

Up  to  this  point,  we  have  considered  the  tradeoff  between  decision  quality  and 
communication  overhead  in  general  terms  to  gain  insight  about  when  communication 
between  agents  should  take  place.  We  now  analyze  the  loss  functions  themselves  in 
detail,  first  formalizing  them,  identifying  their  parameters,  and  then  considering  how 
they  may  be  evaluated  efficiently  through  the  use  of  approximations. 

4.6.2.  Decision  Quality  Loss  Function 

We  begin  by  considering  the  global  decision  quality  loss  function.  Our  goal  is  to 
develop  a  formula  for  it,  using  the  formalism  presented  in  Chapter  3.  This  formula 
will  be  complete,  but  unusable  due  to  its  complexity.  We  then  bridge  the  gap 
between  theory  and  practice  by  providing  a  much  simpler  approximate  formula  for 
the  local  loss. 

For  any  global  state  x(f),  there  is  a  best  collective  decision  d  ( t )  that  will  optim¬ 
ize  some  global  objective  function.  But  agent  A,  does  not  know  x(t)\  rather,  it  has  its 
own  view  of  the  global  state,  which  is  kj(t).  Therefore,  A,-  makes  decision 
dt(t)  =  7 ,(zi(t),Si(t))  (see  Section  3.2  for  definitions  of  these  variables),  which  is  part 
of  the  collective  decision  d(t)  =  (dj  (t),...,dy(t)).  To  compare  the  quality  of  decision¬ 
making,  it  would  be  desirable  to  quantify  the  goodness  of  decision  d(t)  relative  to  the 
best  possible  d*(t).  One  way  of  doing  this  is  to  evaluate  and  compare  the  conse¬ 
quences  of  these  two  decisions. 

Given  state  x(t),  decision  d(t)  causes  the  next  state  x(£+l)  =  f(z(t),d(t)),  and 
decision  d\t)  causes  the  next  state  x'(t+l)  =  f{x(t).d  '(t)):  the  difference  between 
the  consequences  of  d(t)  and  of  d  (f)  may  be  defined  as  the  expected  difference  in  the 
utilities  of  the  respective  next  states, 

i?[u(x(t +l))  -  u(x  (t+1))], 

or  as  the  expected  difference  in  the  utilities  of  the  respective  next  sequence  of  states, 

E[u(x(t+l,t+r))  -  u(x‘(t+M+r))],  t>  1. 

This  difference  is  the  loss  due  to  degradation  in  decisionmaking  quality,  and  is  due  to 
agents  not  knowing  the  global  state  x(t),  and  having  different,  possibly  conflicting, 
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views  of  the  global  state. 

Although  this  definition  of  loss  makes  sense,  no  agent  can  compute  it,  since  it 
knows  neither  the  current  global  state,  nor  the  current  actions  performed  by  the  other 
agents.  Note  that  we  did  assume  that  every  agent  knows  the  decision  rules  7 y  for  all 
j\  which  is  reasonable  in  systems  where  agents  are  willing  to  cooperate,  i.e.,  in  the  sys¬ 
tems  of  interest  here. 

What  agents  can  do  is  to  compute  expected  losses  over  all  possible  global  states, 
influences,  and  private  inputs,  conditioned  on  the  information  they  have.  To  develop 
a  formula  for  this,  we  need  some  additional  notation.  Recall  from  Section  3.2  that  a 
decision  d,  is  based  on  the  decision  rule  7,(2,, s,),  where  z,  is  the  influence  of  other 
agents  on  A,,  and  s,  is  A,’s  generated  work.  Influence  has  two  components:  informa¬ 
tion  influence  and  work  influence.  A,’s  information  influence  variable  kt,  which  con¬ 
tains  information  about  the  states  of  other  agents,  takes  on  values  from  the  global 
state  space, 

X  =  Xi  X  X2  X  •  •  •  X  Xjv. 

Define  the  global  information  influence  variable  k  (with  no  subscript)  which  contains 
the  information  influence  of  every  agent, 

k  =  (kx,  k2,  ■  ■  ■  ,  kN), 

and  takes  on  values  from  the  set 

X*  =  X  x  X  x  •  •  •  x  X, 

where  there  are  N  copies  of  X.  If  x  is  a  particular  global  state  (i.e.,  x  £  X  ),  then  the 
product  is  xxix  •••xx,  where  there  are  N  copies  of  x.  Let 

k  =  xN 

represent  the  situation  where  every  agent's  local  information  influence  is  set  to  the 
same  global  state,  namely  x.  Similar  to  information  influence,  A,-’s  work  influence 
variable  wt,  which  contains  the  transferred  work  from  all  other  agents,  takes  on  values 
from  the  global  work  space, 

W  =  Wx  x  W2  x  •  •  •  xWjf, 

and  each  W,  is  the  work  space  of  agent  A,.  Define  the  global  work  influence  variable 
w ,  which  contains  the  work  influence  of  every  agent,  as 

w  =  (u^,  w2 ,  •  •  •  ,  «7v); 

w  takes  on  values  from  the  product 

=  W  x  W  x  •  •  •  x  W, 

where  there  are  N  copies  of  W.  Finally,  define  the  global  generated  work  variable  s. 
which  contains  the  newly  generated  work  (not  the  transferred  work)  arriving  at  each 
agent, 
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s  =  («i,  s2,  ‘  ‘  ,  sN), 

and  takes  on  values  from  W. 

Continuing  with  the  development  of  a  formula  for  decision  quality  loss,  agent  A , 
can  compute  the  expected  maximum  global  utility  U  (a,),  defined  as  follows: 

U'{cti)  =  £  £  S  u(/(z,7((to,xJV),«)))p(x,(to,xJV),«|  *,•))•  (4.8) 

z ex  we wN  se w 

In  (4.8),  we  are  taking  the  expected  value  of  the  utility  of  the  next  state,  over  the  glo¬ 
bal  state  space,  over  the  global  work  influence  space,  and  over  the  global  generated- 
work  space.  Since  k  —  ,  every  agent’s  view  of  the  global  state  is  the  same,  and  the 

value  of  the  global  state  as  viewed  by  everyone  (which,  in  general,  is  not  the  same  as 
the  actual  global  state)  is  x.  (Note  that  this  does  not  automatically  imply  that  x  is 
common  knowledge  (see  Section  3.5).  That  x  is  common  knowledge  must  be  assumed 
separately.)  For  each  value  of  x,  f(x,i((w,xN),s))  will  produce  the  best  possible  next 
state  in  terms  of  utility  (since  the  decision  is  based  on  perfect  knowledge  of  the  global 
state);  therefore,  C7*(q,)  is  the  maximum  expected  utility  based  on  kt.  Note  that  kx(t) 
will  equal  (x^t-aj,),  x2(t-a2l),...,  xN(t-aNt)),  where  a;t  is  the  age  of  the  most 
recent  communication  from  agent  Aj  to  A,.  Consequently,  we  use  the  vector  of  ages 

o,  =  (^ln  ®2i •  •  •  •>  aNi) 

as  the  parameter  to  U  . 

Computation  of  (4.8)  assumes  that  every  agent  A,  knows  the  conditional  proba¬ 
bility  density  function  (cpdf)  p(x, (u?,x^),$|  &,),  which,  in  general,  is  an  unreasonable 
assumption.  We  will  elaborate  shortly  on  how  this  cpdf  can  be  simplified  to  be  used 
in  a  real  system. 

The  expected  maximum  utility  u‘{at)  is  then  compared  with  U(at),  simply  the 
expected  utility,  defined  as  follows: 

U{a{)  =  £  £  £  £  “(/  (x,7((w,«),«)))p(x, (*»,«), *1  *i)  (4.9) 

sex  we /c€XA’s€W 

Again,  we  are  taking  an  expected  value  over  the  global  state  space,  over  the  global 
work  influence  space,  and  over  the  global  generated  work  space,  but  also  over  the  glo¬ 
bal  information  influence  space  ( k  is  varied  over  this  space).  In  contrast  to  d  ,  the 
decision  d  is  based  on  imperfect  knowledge  of  the  true  global  state.  This  will  usually 
produce  a  suboptimal  decision,  and  consequently  a  suboptimal  next  state  in  terms  of 
utility.  (More  precisely,  the  next  state  will  be  statistically  suboptimal,  due  to  the  sto¬ 
chastic  nature  of  the  distributed  system.  For  example,  it  is  possible  that,  for  a  partic¬ 
ular  set  of  inputs  for  each  agent  (e.g.,  new  work  arrivals),  which  cannot  be  predicted 
beforehand,  what  is  a  suboptimal  decision  in  general,  is  actually  optimal  for  this 
specific  case.) 

There  are  two  noteworthy  properties  regarding  the  relationship  between  U  (a,) 
of  (4.8)  and  U(at)  of  (4.9).  The  first  property  is  that 
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U(at)  <  [/>,), 

and  therefore  U(at)  -  U'(a, )  is  a  non-positive  value,  indicating  an  expected  drop  in 
utility.  This  is  because,  for  given  values  of  x,w,  and  5,  u{f(x,i((w,K),s)))  is  maximum 
when  k  =  xA ,  i.e.,  when  every  agent  has  the  same  global  state  information.  Assuming 
the  optimal  case,  where  for  all  possible  values  of  z,  w,  and  s,  the  probability  that 
k  =  xN  is  1,  then  the  value  of  Z7(at)  is  maximal,  and  is  exactly  U  (a,). 

The  expected  drop  in  utility  is  what  we  define  as  the  expected  decision  quality 

loss 

=  V\ai)  -  U(at).  (4.10) 

This  is  our  first  approximation  to  the  exact  global  decision  quality  loss  Ld(d[t),x(t)). 


The  second  property  is  that 
[U '(a,) 


V(°i) 


increases  with  a ,, 


as  this  difference  should  if  it  is  to  model  decision  quality  loss.  The  reason  is  obvious, 
as  age  increases,  the  probability  of  an  agent  successfully  predicting  the  current  global 
state  goes  down,  which  increases  the  probability  of  making  bad  decisions.  Indeed,  the 
decision  rule  7  should  be  constructed  so  that  the  probability  of  selecting  bad  decisions 
grows  as  slowly  as  possible  with  the  age  of  information. 

4.6.3.  Observations  and  Simplifications 

In  theory,  an  agent  can  determine  the  expected  decision  quality  loss 
using  (4.10).  However,  this  cannot  be  done  while  maintaining  our  goal  of  fast 
decisionmaking.  (This  is  in  contrast  to  the  quantity  we  are  trying  to  approximate, 
Ld(d(t),x(t)),  which  cannot  be  computed,  not  because  it  would  be  computationally 
inefficient,  but  because  agents  cannot  know  z(f)  and  cannot  theoretically  compute 

<*'(*)•) 

Recall  the  complete  and  exact  global  loss  function  as  presented  in  Section  4.4: 


L{t)  =  Ld(d{t),x{t))  +  Lc{Ak{t),x{t))  +  Le{^x{t),k{t))  +  Lr{s{t),x{t),d{t)) 

We  have  concentrated  on  Ld  and  Lc  because  they  will  change  dynamically,  whereas 
Le  will  be  relatively  constant  once  the  decision  rule  7  is  defined,  and  similarly  Lr  is  a 
fixed  loss  due  to  the  inherent  stochastic  nature  of  the  system.  Requiring  an  agent  to 
compute  ljf)(at)  as  defined  above  would  make  Lt  extremely  large,  and  consequently 
the  total  loss  would  become  extremely  large.  Since  we  want  to  minimize  the  total 
loss,  we  need  cheap  ways  to  determine  Ld  and  Lc  dynamically  during  system  opera¬ 
tion  so  that  Le  is  kept  low. 

To  make  X^(at)  cheap  to  compute,  and  yet  a  reasonable  approximation  to 
Ld(d(t),x(t))  so  that  we  can  attain  our  ultimate  ^goal  of  determining  the  vector  of 
near-optimal  communication  periods  Tt  =  {Tu,  T2{,  ■  ■  ■  ,  T jv,-),  we  will  now  identify 
the  information  we  need,  and  the  first  order  effects  of  Ld. 
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As  was  made  clear  in  the  informal  analysis  of  local  loss  presented  in  Section  4.6.1, 
we  are  interested  mainly  in  the  shape  of  L d,  in  particular  its  rate  of  increase  relative  to 
the  rate  of  decrease  of  Lc,  so  that  a  minimum  point  in  the  sum  of  losses  can  be  found. 
There  is  the  underlying  assumption  that  such  a  minimum  point  exists,  otherwise  it 
would  not  make  sense  for  agents  to  communicate  at  all,  contradicting  the  purpose  of 
computing  these  losses.  The  major  factor  contributing  to  the  shape  of  L d  is  the  cpdf 
p(z(£)|  k,(t)).  Specifically,  the  rate  at  which  the  probability  mass  spreads  from  the 
known  past  state  to  other  states  provides  a  good  indicator  of  the  rate  at  which 
L^\ai)  rises.  The  faster  this  spread  occurs,  the  more  uncertainty  there  is  in  the 
current  state  based  on  past  information,  and  therefore  the  faster  the  rise  in  the  loss 
due  to  decisionmaking  quality. 

U(ax)  and  t/'(a,)  in  (4.8)  and  (4.9)  depend  on  not  only  the  global  state  variable 
x ,  but  also  the  work  and  information  influence  variables,  w  and  k ,  and  the  generated 
work  s.  We  will  argue  that  the  effects  of  w  and  s  on  the  shape  of  Ld  are  secondary  to 
that  of  p(x(£)|  k,{t))  for  most  situations,  as  the  rates  at  which  work  is  transferred  and 
new  work  arrives  are  expected  to  be  much  smaller  than  the  inter-agent  communica¬ 
tion  rate  (otherwise,  as  we  argued  in  Section  4.6.1,  many  state  changes  would  take 
place  "within  one  communication  period,  which  goes  against  our  assumptions).  Like¬ 
wise,  the  effects  of  most  components  of  the  global  information  variable  k  are  secon¬ 
dary  in  that  most  decisions  are  not  based  on  the  entire  global  state,  but  rather  on  a 
small  set  of  local  states,  particularly  that  of  the  decisionmaking  agent  and  some 
specific  remote  agent  (e.g.,  if  the  decision  is  to  transfer  work,  the  state  of  the  remote 
agent  who  is  to  receive  the  work  is  of  utmost  interest). 

Yet,  there  are  some  situations  where  w,  s ,  and  especially  k,  can  have  a  major 
effect  on  the  shape  of  Ld.  For  some  small  collection  of  views  which  agents  possess 
about  each  other,  the  resulting  decisions  based  on  these  views  might  conflict  to  such  a 
high  degree  that  they  will  cause  the  system  to  go  into  very  undesirable  global  states, 
i.e.,  those  with  very  low  utility.  In  fact,  this  may  result  not  only  because  of  differing 
views,  but  also  from  some  collections  of  work  transfers  or  new  work  arrivals.  A  typi¬ 
cal  example  of  this  is  when  all  agents  happen  to  transfer  work  to  a  single  agent,  which 
was  viewed  by  each  one  as  being  the  most  desirable  agent  to  whom  work  should  be 
transferred.  The  problem  is  that  each  agent  does  not  expect  that  every  other  agent 
also  sees  this  single  agent  as  the  most  desirable.  (Actually,  this  may  be  the  result  no 
matter  whether  agents  have  the  same  or  differing  views;  the  point  is  that,  whatever 
these  views  were,  they  led  to  all  agents  finding  the  same  single  agent  as  the  most 
desirable  destination  of  work.)  From  a  single  agent’s  perspective,  its  decision  to 
transfer  work  to  w'hat  it  considers  to  be  the  most  desirable  destination  agent  may 
make  complete  sense,  except  for  the  situation  where  every  other  agent  comes  to  the 
same  unexpected  conclusion  (i.e.,  unexpected  by  each  single  agent).  We  call  such  a 
situation  a  resonance,  and  will  deal  with  it  separately  in  Section  4.7. 
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4.6.4.  Approximations 

The  purpose  of  the  discussion  above  is  to  provide  guidelines  for  constructing 
approximations  based  on  simplifying  (4.10)  which  is  itself  an  approximation  for 
Ld(d(t),x(t)).  We  now  offer  four  simpler  approximations.  Each  successive  approxi¬ 
mation  depends  on  a  smaller  number  of  factors,  making  it  easier  to  compute,  but 
potentially  introducing  more  error.  Selection  of  the  best  approximation  will  depend 
on  the  application,  and  particularly  on  the  size  of  the  distributed  system  and  the 
degree  of  dependence  among  the  actions  of  agents. 


First  Approximation 

The  first  approximation  distinguishes  between  primary  and  secondary  variables 
in  (4.8)  and  (4.9),  replacing  summations  over  secondary  variables  with  average  values, 

'  ^  _  —  -w  »  ^  T  i  1  j  X—  ■  1  M • 

and  it  considers  only  abstract  states  in  Y  =  Yi  x  Y2  x 
level  states. 

”  V.)  =  £  u(  /(^  t((w,  </>*)), T)))  p{^\  ki) 


Y„,  rather  than  low- 


U 


(4.11) 


U{ai)  =  £  u(  /(&  k?),T)))  PW I  k<)  (4.12) 

Thus,  the  first  approximation  for  X<^(ai) 

L[Q\at)  »  u\aM-U{ai)  (4-13) 

Formulas  (4.12)  and  (4.13)  differ  from  (4.8)  and  (4.9)  respectively  in  that  all  factors  in 
the  formulas  deemed  secondary  in  the  discussion  above,  namely  work  influence  w, 
information  influence  X,  and  newdy  generated  work  s,  have  been  replaced  by  expected 
values.  Thus,  u;  is  the  expected  amount  of  work  to  be  transferred,  which  may  be  a 
static  value  based  on  an  analysis  of  complete  past  histories,  or  a  value  dynamically 
recomputed  on  the  basis  of  an  analysis  of  recent  past  histories.  Similarly,  s^is  the 
expected  amount  of  newdy  generated  work.  For  information  influence,  we  use  ip  (i.e., 
every  agent  knows  the  same  global  state  on  which  decisions  are  based)  in  (4.11),  and 
we  use  kf  (i.e.,  agent  A,  believes  all  agents  share  the  same  view,  in  particular  the 
view'  of  A,-,  but  this  may  be  different  from  what  the  global  state  really  is)  in  (4.1^.). 
Again,  this  ignores  the  problem  of  resonances  wrhere  conflicting  decisions  are  made, 
arising  from  special  combinations  of  values  for  w,  s,  and  k,  but  we  solve  this  problem 
separately.  Our  main  focus  here  is  to  establish  a  cheap  way  of  determining  the  shape 
of  I^(a,),  so  that  eventually  a  good  communication  period  can  be  determined. 

If  the  distributed  system  is  very  large,  the  loss  approximation  defined  (4.13)  is 
still  too  time-consuming  to  compute  because  of  the  range  of  values  over  which  the 
summation  variable  xp  will  vary.  For  example,  if  there  are  one  hundred  agents,  and 
the  size  of  each  agent’s  state  space  is  two,  xp  will  range  over  2100  possible  values. 
Thus,  the  approximation  is  useful  only  if  the  size  of  the  distributed  system  is  small 
and  each  agent’s  state  space  is  small.  Otherwise,  we  must  simplify  our  formulas 
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further. 

Second  Approximation 

In  the  second  approximation,  we  will  consider  the  loss  due  to  information  aging 
on  an  agent-by-agent  basis,  as  opposed  to  the  previous  approximation  which  was  on  a 
system-wide  basis  and  accounted  for  all  agents  simultaneously.  The  loss  is  still  global, 
however:  it  represents  a  degradation  in  decision  quality  for  the  entire  distributed  sys¬ 
tem.  In  particular,  this  approximation  is  the  global  loss  contributed  by  the  conse¬ 
quences  of  decisions  made  by  an  agent  A,  which  directly  affects  a  single  remote  agent 

Ay 

U*D. >,)  =  7  S  E  u(  /(  A  do"1-Vdor"'))  PW  ki)  (4.14) 

,J  P  SieDij  tper,{6,) 

A>.(a.-)  =  -j  £  £  u(  /(  r/>,  di'^di-df"1))  p{^\  K)  (4.15) 

P  6j£D,j  v>er i{6i) 

Thus,  the  second  approximation  for  L^{ctt)  is, 

L(da){at)  «  VDij{<*i)  ~  uDij{ai)  (4-16) 

Dij  is  the  set  of  all  the  decisions  in  agent  A,’s  decision  set  which  are  considered  to 
have  a  direct  effect  on  agent  Ay.  (Since  these  are  decisions  with  which  A^  will 
influence  Ay,  in  the  subscript  t  is  followed  by  ;.)  Thus,  Ay  C  D,,  and,  if  6 ,  G  Ay, 
then  Qij{Si)  #  w0.  Recall  from  Section  3.2  that  gxj  is  the  work  function,  mapping 
decisions  by  A,-  to  work  try,-  appearing  at  agent  Ay,  and  that  w0  denotes  "no  work. 

T ,•(£,■)  is  the  set  of  all  global  states  such  that  if  €  P, (<^, ) ,  A,  would  make  deci¬ 
sion  Sx.  Recall  that  a  decision  is  a  function  of  work  transfers,  state  information,  and 
generated  work.  Thus,  from  another  viewpoint,  T,  is  7 7\  the  inverse  of  A.-’s  decision 
rule,  for  given  values  of  wx  and  $t. 

Finally,  recall  that  do  is  the  null  decision,  meaning  that  the  agent  decides  to  sim¬ 
ply  do  nothing.  Then  the  product  is  the  set  of  decisions  where  agents  Ay, 

for  j  <  1,  make  the  null  decision,  agent  A,  makes  decision  <5,-,  and  agents  A*,  for  k>i, 
make  the  null  decision.  The  product  d'0dxd»-'  has  the  same  meaning,  except  that 
A,  makes  the  usual  decision  dt-  based  on  kx,  rather  than  6t  which  is  an  element  of  Ay 

Let  us  analyze  u'Dij  (a,-)  and  UDij(at)  in  (4.14)  and  (4.15).  First,  they  are  global 
expected  utilities  since  the  next  state  function  /  is  global  (this  is  in  contrast  with  the 
next  two  approximations,  which  make  use  of  local  expected  utilities).  Second,  the 
expectations  are  only  over  global  states  that  would  trigger  decisions  affecting  Ay.  In 
fact,  since  the  expectations  are  only  over  these  states,  a  normalization  factor  1//3  is 
necessary,  where 
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0=  £  £  p(V>|  *,-)  =  Prob{  y  €  Ur,(^))- 

6,eDij  il’tTAC)  6'eD'i 

p  is  the  probability  that  the  current  global  state  y  is  a  member  of  the  set  of  possible 
states  which  would  trigger  any  decision  in  Diy 

The  loss  approximation  defined  by  (4.16)  is  useful  because  it  considers  global 
decision  quality  loss  on  an  agent-by-agent  basis,  with  the  assumption  that  a  decision 
has  a  direct  effect  on  only  one  remote  agent.  Given  this,  along  with  loss  due  to  com¬ 
munication  overhead  on  an  agent-by-agent  basis,  the  optimal  communication  period 
between  each  pair  of  agents  can  be  determined.  Thus,  A,  can  determine  the  fre¬ 
quency  of  communication  between  itself  and  every  remote  agent  Ay,  for  all  Also, 

the  number  of  states  to  consider  may  be  significantly  less  than  the  entire  global  state 
space,  which  was  a  problem  with  the  previous  approximation. 

Unfortunately,  (4.16)  requires  knowledge  of  the  global  next  state  function  /,  and 
of  the  cpdf  p(xj) |  kt),  which  are  generally  unavailable.  Also,  although  the  size  of  Ay 
will  be  small,  the  number  of  global  states  in  r,(<5,)  may  still  be  quite  large. 


Third  Approximation 

The  third  approximation  considers  local  losses,  the  losses  of  the  decisionmaker  A, 
and  the  remote  agent  Ay  affected  by  the  decision,  rather  than  an  overall  global  loss. 
First,  we  need  some  additional  definitions. 

Let  c(u,-,uy)  be  a  real-valued  averaging  function  of  ut-  and  uy,  which  are  real 
numbers  representing  local  utilities  of  the  decisionmaking  agent  A„  and  of  some 
remote  agent  Ay.  For  example,  e(u,,uy)  may  be  the  arithmetic  mean  (u,  +  u;)/2,  or 
the  geometric  mean  (ut-  *  uy)1//2,  or  the  root  mean  square  (ut  +  u;) 

If  xj)  is  a  global  state,  let  [V’Jy,  the  jih  component  of  A  be  the  local  state  of  Ay 
corresponding  to  xp.  Let  «Jy  be  the  expected  work^  transferred  to  Ay;  let  «y  be  the 
expected  information  Ay  has  about  the  system;  let  Sy  be  the  expected  generated  work 
arriving  at  Ay.  Note  that  these  are  expected  values  as  viewed  (i.e.,  computed)  by  A,. 
Most  likely,  they  will  be  time-dependent,  but  it  is  assumed  that  they  change  very 
slowly,  and  can  conveniently  be  communicated  when  necessary  with  minimal  over¬ 
head. 

Finally,  recall  that  g,y(d,)  is  the  work  function  indicating  what  work  will  appear 
at  Ay  based  on  the  decision  dx  made  by  A,.  Let  w, y-  be  the  average  work  transferred 

from  A,  to  Ay,  and  let  giJ^  —  be  the  same  as  the  T Uy  defined  above,  except  that  its 

uJ,-y 

ith  component,  the  work  transferred  to  Ay  from  A,,  is  gt-y(d,-).  (Consequently,  we  are 
replacing  with  gtJ{dt)  in  wy)  We  are  now  ready  to  introduce  the  third  approxima¬ 
tion. 
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~Dij  (a 

.)  = 

(4.17) 

■j  E 

E 

e  (  u ,  (  U  ( y„  6 , ) ) ,  u  y  (  /y  ( [  ■ lp\  y ,  ■ 7  y  ( ( ■ — 

.«;)’a>))))  Pty  1  *.')i 

P  6,eD,j 

0er,-(tfi) 

wtJ 

,)  = 

(4.18) 

J  E 

E 

Wj 

c(Ui(/,(y„d,)),  Uy(/y([^]y,7j(( - 

,“;)”;))))  P(0|  *.')■ 

P  Si€Dij 

Thus,  the  third  approximation  for  (<*»') 

I(da)K)  *  uDt]{at)  ~  UDijM  (4-19) 


The  main  difference  between  the  third  approximation  given  by  (4.19)  and  the  second 
approximation  given  by  (4.16)  is  that  the  next  state  functions  are  local  rather  than 
global  in  (4.17)  and  (4.18).  Further,  we  consider  a  combination  of  the  local  state  util¬ 
ities  of  A,  and  Ay,  not  the  global  state  utility.  Notice  that,  since  A,  s  state  is  known 
with  certainty  (since  A,-  is  computing  these  utilities),  the  local  current  state  of  A,  used 
in  the  next  state  function  /,  is  y,,  A,’s  true  local  state.  Similar  to  the  second  approxi¬ 
mation,  the  expectations  are  taken  over  all  global  states  which  could  trigger  decisions 
affecting  Ay  by  A,-.  Thus,  the  local  state  of  Ay  used  in  its  next  state  function  /y  is 
[■0]y.  The  second  parameter  of  /y  is  Ay’s  decision,  computed  by  using  its  decision  rule 
*1  j  with  expected  values,  except  for  the  transferred  work  which  depends  on  A,  s  deci¬ 
sion.  This  assumes  that  the  global  state  *0,  which  triggered  A,-  to  make  its  decision,  is 
still  in  effect  up  to  the  time  when  Ay  receives  any  transferred  work.  (Receiving  work 
does  not  mean  that  Aj  has  been  affected  by  the  work  transfer  in  any  significant  way, 
at  least  according  to  the  design  of  Ay’s  abstract  state  space,  which  is  much  coarser 
than  its  low-level  state  space  Xy.  In  fact,  when  A,  does  get  affected,  it  will  have 
changed  state  because  it  decided  to  either  accept  the  work,  or  to  transfer  it  somewhere 
else.) 

Fourth  Approximation 

Using  (4.19)  as  the  loss  approximation  still  poses  the  problem  of  knowing  the  glo¬ 
bal  state  transition  probabilities,  p(x/j |  &,).  This  leads  us  to  the  fourth  approximation, 
which  is  based  on  the  expected  local  state  utilities  of  A,',  taken  over  all  possible  local 
states  of  Ay  (since  it  is  Ay’s  state  of  which  decisionmaker  A,-  is  uncertain): 
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Wji(aj«)  -  (4-20) 

£  c(ut(f,(yt,d,\ 0y),  Uj(/y(^r7;((  —  ~~  —  -~;)^))))  *>(<M  **) 

tf/€Yy 

"u, •,(<*,,)  =  (4-21^ 

£  c(u,(/t(yt,d,)),  Wj  p(0y|  **) 

*>6Yy  u’«; 

Thus,  the  fourth  approximation  for  Lj/^(a,)  is, 

i(rf°)(ajV)  «  «£■(«*)  -%)  (4-22) 

Let  (<f,|  0y)  denote  the  decision  similar  to  d,  given  by  decision  rule  7,((u;;,&,),sI), 
except  that  the  state  information  k,  has  its  jth  component  substituted  with  xpj.  Thus, 
in  (4.22)  we  are  approximating  the  expected  loss  in  utility  of  A,’s  and  Ay  s  next 
abstract  states,  not  by  considering  all  possible  decisions  w'hich  will  affect  Ay,  but  only 
decisions  triggered  by  Ay’s  possible  states.  The  conditional  probability  density  of  each 
possible  state  ipj  given  a  past  state  kjt  is  p{*l>j\  kji)-  (Recall  that  k^(t)  equals 
(y1(t-aiI),...,yyV(^-«Y.));  therefore,  /r;V(t)  =  y;(t-ay,),  which  is  used  in  the  formula.) 
We  use  this  fourth  approximation  in  our  load-balancing  experiments  to  compute  deci¬ 
sion  quality  loss  as  a  function  of  aging  information. 

This  concludes  our  discussion  about  approximating  the  loss  caused  by  decision 
quality  degradation  due  to  aging  information.  It  is  interesting  to  note  that,  in  our 
quest  to  simplify  Lj  in  order  to  reduce  Le ,  which  is  the  loss  due  to  evaluating  the  deci¬ 
sion  rule,  we  are  effectively  increasing  Lr,  the  loss  due  to  random  effects.  A  reason  for 
this  is  the  statistical  nature  of  our  approximations,  which  are  expectations  over  multi¬ 
ple  variables.  Although  it  is  beyond  the  scope  of  this  dissertation  to  quantify  Le  and 
Lr,  we  have  introduced  them  to  focus  on  the  tradeoff  between  the  ease  and  speed  of 
computing  the  decision  rule,  and  the  degree  of  error  such  a  computation  produces. 

4.6.5.  Communication  Loss  Function 

Assume  for  a  moment  that  the  time  devoted  to  the  act  of  communicating,  i.e.,  to 
the  construction  of  messages  to  be  transmitted  and  to  the  interpretation  of  messages 
received,  is  simply  wasted  time.  Ultimately,  we  would  like  to  say  that,  if  an  agent 
wastes  a  certain  percentage  of  its  time,  there  will  be  a  known  or  measurable  degrada¬ 
tion  in  performance,  which  will  manifest  itself  as  a  loss  in  future  state  utility. 

Of  course,  although  communication  overhead  represents  a  loss  in  processing  time 
and  therefore  a  decrease  in  utility,  it  is  expected  that  the  fruits  of  communicating 
with  other  agents  (i.e.,  the  value  of  new  information),  will  cause  an  increase  in  utility 
larger  than  the  decrease  due  to  overhead.  When  this  is  the  case,  it  pays  to 
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communicate. 

Thus,  the  purpose  of  quantifying  loss  due  to  communication  overhead  is  to  deter¬ 
mine  the  benefit  vs.  cost  tradeoff  of  communication.  For  some  rate  of  inter-agent 
communication,  the  combined  losses  due  to  communication  overhead  and  to  the  dec¬ 
lining  value  of  aging  information  are  at  a  minimum.  The  goal  is  to  find  that 
minimum  point,  and  to  communicate  at  the  corresponding  frequency  between  each 
pair  of  agents. 

Each  agent  must  quantify  its  communication  overhead  as  a  function  of  its  fre¬ 
quency  of  communication.  If  an  agent  A,  communicates  with  a  remote  agent  Ay,  let 


Fji  =  frequency  of  communication  of  A,  with  Aj. 
In  terms  of  intercommunication  periods,  let 


be  the  period  of  communication  between  A,  and  Aj.  Define  the  average  time  between 
all  communications  by  A,  as 


Ti  - 


>= i 


(4.23) 


Note  the  difference  between  T,  defined  by  (4.23),  and  T,,  which  is  the  vector  of  inter¬ 
communication  periods  (T lJ5  T2»>  •  •  •  >  T^f)  between  A,  and  Ay,  for  all  j.  We 
alluded  to  T,-  in  Section  4.6.1,  and  make  use  of  it  in  this  section.  The  communication 
overhead  loss  for  agent  A,-  will  be  a  function  of  T,  since  it  depends  solely  on  the  total 
rate  of  communication,  without  regard  for  the  relative  magnitudes  of  individual  com¬ 
munication  rates  between  A,  and  any  particular  remote  agent.  Thus,  the  communica¬ 
tion  overhead  loss  is  denoted  by 


On  the  other  hand,  the  decision  quality  loss  does  depend  on  the  relative  ages  of  infor¬ 
mation  about  remote  agents  on  which  A,  will  base  its  decisions,  so  the  individual 
intercommunication  periods  cannot  be  combined,  but  must  remain  as  a  vector: 

L[P(T,). 

Let  A,-  be  the  average  overhead  in  CPU  time  to  support  a  single  communication 
between  A,-  and  another  agent.  Thus,  A,/!!1,  is  the  fraction  of  cycle  timejon  the  aver¬ 
age)  spent  by  A,  communicating,  and  a  necessary  constraint  is  that  A,/T,-  <  1,  other¬ 
wise  all  time  is  spent  communicating  and  nothing  else  gets  done. 

We  need  to  determine  how  A,/ Tt  affects  the  next  state  function,  and  ultimately 
how  it  affects  the  expected  utility  of  future  states  E[u{x{t  +l,r+l))],  for  some  r^l. 
The  reason  for  introducing  the  fraction  of  time  h,/T{  is  that  one  can  often 
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conveniently  express  a  loss  of  some  measure  of  performance  (one  that  relates  to  the 
abstract  state  space,  and  consequently  to  utility)  in  terms  of  this  fraction.  For  exam¬ 
ple,  if  half  of  an  agent’s  time  is  spent  communicating,  its  effective  rate  of  processing 
tasks  drops  bv  at  least  a  factor  of  two,  and  therefore  the  total  time  to  complete  a  task 
is  at  least  doubled. 

Our  objective  is  to  find  values  for  periods  T }t ,  for  all  j,  which  minimize 

L[T)(T:)  +  L$\Ti). 

Achieving  this  objective  raises  several  problems,  especially  when  the  distributed  sys¬ 
tem  has  a  large  number  of  agents.  First,  all  the  T}t's  must  be  varied  simultaneously 
to  find  their  optimal  values;  it  may  be  difficult  finding  a  way  of  doing  this  efficiently. 
Second,  the  loss  function  L^(T{)  will  change  dynamically,  depending  on  the  rate  at 
which  the  global  system  state  changes.  Third,  as  pointed  out  in  Section  4.6.4,  it  is 
often  the  case  that  the  best  an  agent  can  do  is  to  compute  an  approximation  to  Ld, 
such  as  the  pair-wise  decision  quality  loss  between  two  agents  given  by  (4.22),  from 
which  it  can  compute  L^\T p)  using  (4.6). 

In  light  of  these  formidable  problems,  we  propose  the  following  method  for  deter¬ 
mining  the  update  period  between  two  agents.  This  solution  is  based  on  preallocating 
a  maximum  communication  bandwidth  between  every  pair  of  agents,  and  then 
minimizing  a  local  rather  than  global  loss  function. 

The  first  step  is  to  recognize  that,  for  small  values  of  T ,, 

l[tHt,)  » 

From  (4.23),  it  is  possible  that  T,  is  small  even  though  some  TtJ  is  large  (this  is  true  if 
there  is  some  T ,•£,  k^j,  which  is  small).  Yet,  Lc  can  be  much  greater  than  Ld  since 
L[T\ft)  goes  to  infinity  as  T,  approaches  zero,  whereas,  L^(Tt)  has  an  upper  bound 
for  large  values  of  any  component  Tl}  of  T,  (see  Section  4.6.1).  This  simply  means 
that  when  the  frequency  of  communication  is  very  high  (with  any  remote  agent),  the 
loss  due  to  communication  overhead  will  dominate  the  loss  due  to  decision  quality 
degradation.  For  small  values  of  T,-  we  will  approximate  the  sum  of  Lc  and  Ld  with 
just  Lc,  ignoring  the  relatively  small  contribution  of  Ld. 

Next,  we  determine  the  minimum  value  of  T,  such  that  L[  ^(T,)  is  tolerable. 
We  will  call  this  value  TtWN,  and  the  corresponding  frequency 

p.  =  — l — . 

1  1 MAX  rp 

A  '  MIN 

F \MAX  corresponds  to  A,-’s  maximum  bandwidth  used  by  A,-  for  communicating  with 
all  remote  agents  (i.e.,  the  sum  of  the  communication  frequencies  between  A,  and 
every  other  agent  must  be  at  most  FjMAX). 

This  maximum  bandwidth  is  now  divided  into  per-agent  maximum  bandwidths, 
F-  1V,  Of  course,  this  distribution  need  not  be  a  uniform  allocation;  rather, 

J1MAX  ’  ^  ^  7 
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the  amount  allocated  for  a  particular  remote  agent  will  depend  on  the  expected  needs 
for  communication  with  that  agent.  For  example,  an  agent  will  expect  to  communi¬ 
cate  more  frequently  with  a  neighboring  agent  than  with  a  very  distant,  agent;  thus, 
more  bandwidth  should  be  preallocated  for  communication  with  neighboring  agents. 

To  achieve  such  a  distribution,  we  must  define  a  measure  of  desirability  on  a 
per-agent  basis,  as  determined  by  agent  A,.  Let  <7y,  be  A.’s  measure  of  desirability  for 
Ay  The  definition  of  a  p  will  depend  on  the  application.  For  example,  a  definition 

based  on  proximity  is, 

_ 1 _ 

;I  distance  (A,,  Aj) 

where  distance(At,A:)  the  expected  transmission  time  from  A,  to  Ay  Or,  ay,  may  be 
based  on  the  relative  computing  power  of  the  agents. 

Once  the  ay,  are  determined  for  all  ;,  let 

N 

°i  =  Yi°P- 

J  =  l 

Now,  we  can  simply  allocate  maximum  per-agent  bandwidths  as  follows. 

^  PM  AX  ~  F'MAX  ’ 

Once  the  F~ w,v’s  are  determined,  an  iterative  algorithm  such  as  the  following  can  be 

PM  AX 

used  to  determine  the  Ty,-’ s. 

1.  for  all  j,  let  Ty,  =  l/FpMAX 

2.  for  all  j,  in  order  of  decreasing  a y, ,  find  Fp^l/FpMAX  that  minimizes 

Lf>(fi)  -ri'/'fr,,)  +  S  i'/'ir,*) 

k  =  l,k=£j 

Step  1  initializes  the  Ty,’ s  to  their  minimum  values.  Step  2  considers  each  Ty,-  in 
order  of  importance  (based  on  ay,-),  and  may  increase  its  value  to  minimize  the  objec¬ 
tive  function.  In  minimizing  the  objective  function,  only  TJt  is  allowed  to  vary,  the 
Tiii' s  are  kept  constant. 

The  Tji's  are  then  recalculated  by  A,  whenever  there  is  a  change  in  the  Ay’s 
cpdf,  p{y:{t)  |  yj{t-aji))  (which  Ay  must  explicitly  communicate  to  A,).  There  are 
several  optimizations  possible  here.  In  step  2,  a  recalculation  of  every  T need  not 
occur.  If  the  change  in  Ay’s  cpdf  is  such  that  the  probability  mass  spreads  more 
slowly  over  the  state  space  with  increasing  ay,-,  then  Ty,-  can  be  increased,  and  any  T 
wrhich  is  not  at  its  minimum  value  1  /FikMAX  can  now  decrease  (to  improve  decision 
quality).  Again,  the  order  of  checking  each  T should  be  based  on  decreasing  values 
of  atk.  If  the  change  in  Ay’s  cpdf  is  such  that  the  probability  mass  spreads  more 
rapidly  over  the  state  space  with  increasing  ay,-,  then  Ty,-  should  be  decreased,  unless 
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it  was  already  at  its  minimum  value  in  which  case  no  recalculation  is  necessary. 
Thus,  recalculation  only  takes  place  when  necessary  (as  opposed  to  doing  it,  say, 
periodically).  W e  use  static  preallocation  of  bandwidth  in  the  experiments  described 
in  Chapter  6. 

A  potential  improvement  with  respect  to  static  preallocation  of  bandwidth  is 
alwavs  to  keep  in  reserve  some  bandwidth  for  dynamic  allocation.  Thus,  let 

F  -  F  -i-  F 

r  'MAX  'SMAX  1  'RSR  V  ’ 

where  FtSMA Y  is  the  maximum  bandwidth  allocated  statically,  and  FiRSRV  is  reserved 
bandwidth  allocated  dynamically.  Also,  let 

F  -  F  -  G and  F..  =  F  • _ IL 

*  j'SM AX  ~  * 'SMAX  „  ’  anQ  r  J'RSRV  r  'RSRV  ‘ 

u  t  u  t 

The  T~,  for  all  j,  are  determined  statically  in  a  manner  similar  to  the  one  described 
above,  except  that  FjiSMAX  is  used  in  step  1  rather  that  Fj;MAX.  Reserved  bandwidth 
from  Tv-. DT,  is  allocated  in  step  2  whenever  T„  needs  to  be  decreased.  Again,  the 
order  of  allocation  is  based  on  cr  jt. 

4.7.  SPACE/TIME  Randomization  to  Avoid  Resonances 

The  seventh  design  principle  is  to  use  SPACE/TIME  randomization  to  avoid 
resonances.  In  Section  3.4,  we  stated  the  second  fundamental  problem  of  decentral¬ 
ized  control:  an  agent  is  uncertain  about  remote  agent  actions.  Even  if  all  agents  fol¬ 
lowed  the  same  decision  rules,  an  agent  could  not  predict  the  actions  of  a  remote 
agent  since  it  would  not  know  that  agent's  view  of  the  system  state,  nor  would  it  not 
know  that  agent’s  influences.  The  situation  where  the  concurrent  local  decisions  made 
by  all  agents  (which  make  up  the  global  decision)  are  mutually  conflicting,  thereby 
causing  the  system  to  go  into  an  undesirable  global  state,  is  called  a  resonance  in  the 
system. 

The  decisions  that  can  cause  resonances  have  to  do  with  work  transfer.  If  the 
decentralized  control  problem  is,  for  example,  load  balancing,  work  transfer  means 
offloading  processes  from  one  machine  to  another.  If  the  decentralized  control  prob¬ 
lem  is,  for  instance,  network  routing,  work  transfer  means  sending  messages  from  one 
machine,  through  intermediate  machines,  to  a  specific  destination  machine.  One 
characteristic  of  large  distributed  systems  is  that,  in  general,  there  are  many  potential 
candidates  for  work  to  be  transferred,  e.g.,  many  less-loaded  machines  for  load  balanc¬ 
ing,  many  low  traffic  routes  for  network  routing,  simply  because  there  are  many 
machines  and  many  routes  to  start  out  with.  There  may  be  a  best  candidate,  but 
many  others  can  be  nearly  as  good. 

To  avoid  resonances,  a  number  of  solutions  might  be  proposed.  For  example,  the 
agents  could  coordinate  their  actions  by  making  agreements  on  how  to  act.  This 
might  involve  taking  votes,  establishing  contracts,  and  so  on.  In  general,  this  may 


70 


Principles  and  Techniques 


Chap.  4 


involve  several  rounds  of  communication  among  the  agents,  which  incurs  a  cost  in 
time  that  we  are  generally  not  willing  to  pay. 

Our  solution  is  based  on  agents  making  decisions  which  take  the  possibility  of 
remote  agent  actions  into  account  without  explicitly  communicating.  Given  its  view 
of  the  global  system  state,  an  agent  can  make  reasonable  inferences  about  the  possible 
actions  of  remote  agents.  Resonances  can  be  avoided  by  randomly  selecting  one  of  the 
many  good  candidates.  Assuming  that  ail  agents  randomize,  and  they  will  since  our 
model  of  agents  assumes  they  are  cooperative  and  not  adversarial,  the  chances  for 
mutually  conflicting  decisions  are  reduced.  We  call  this  global  coordination  by  impli¬ 
cit  communication  since  the  coordination  occurs  based  on  past  state  observations, 
inferences,  and  pre-established  conventions  programmed  into  the  agents.  Since  expli¬ 
cit  communication  is  minimized,  the  time  frame  for  making  decisions  is  minimized, 
and  efficiency  is  likely  to  be  higher. 

4.7.1.  Problem  Formalization 

Let  us  first  formalize  the  problem,  and  define  what  a  resonance  is  in  terms  of  util¬ 
ity  loss.  Given  it’s  current  information  kt(t)  about  the  system  state,  and  input  s,  ( t ) 
indicating  newly  generated  work  entering  the  system,  an  agent  A,  must  decide  where 
this  new  work  must  be  performed.  Thus,  there  will  generally  be  multiple  work- 
transfer  type  decisions,  which  we  will  denote  by  a  set  of  6).  s,  each  representing  an 
agent  At’s  decision  to  transfer  work  to  a  particular  agent  Ak.  This  set  of  possible 
decisions  will  also  include  the  null  decision,  do,  which  means  to  keep  the  work  locally 
(either  for  performing  it  or  for  possible  transfer  at  a  later  time). 

To  select  which  decision  is  best,  the  expected  consequences  of  each  decision  must 
be  considered;  this  can  be  expressed  as  expected  utilities  of  probable  future  states,  or 
by  a  directional  heuristic  providing  an  expected  positive  change  in  utility.  With  a 
directional  heuristic,  a  measure  of  the  consequence  of  the  decision  Sk  is  given  by  a  real 
number  A^,  which  is  the  change  in  utility  expected  if  agent  A,  selects  the  decision  to 
transfer  work  to  Ak.  Using  game  theory  terminology,  A*  is  the  decisionmaking 
agent’s  payoff  for  selecting  the  decision  6k ,  which  will  have  to  be  maximized  by  A,-. 

A  major  problem  with  this  formulation  is  that  any  realistic  payoff  Ak  will  depend 
not  only  on  A,’s  selection  of  Sk,  but  also  on  the  decisions  possibly  made  by  all  other 
agents  at  that  time.  We  have  referred  to  this  collective  decision  as  d(t).  Thus,  A* 
generally  depends  on  d(f),  with  the  restriction  that  d,(f)  Sk.  The  problem  is  that 
the  number  of  possible  values  d(t)  can  take  on  is  very  large,  and  further  that  an  agent 
must  know  k{t)  =  (Jfci(0>M0.  •  •  •  ,kN{t))  (along  with  a  number  of  other  things)  to 
compute  d(t).  So,  for  all  practical  purposes,  the  true  value  of  payoff  A*  cannot  be 
known. 

We  call  A^’s  dependence  on  6k  a  direct  dependence,  and  the  dependencies  on 
decisions  by  other  agents  indirect  dependencies.  Consider  what  would  happen  if  A, 
simply  ignored  indirect  dependencies,  and  computed  a  payoff  A(^),  a  function  based 
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solely  on  6 k.  Assuming  that  the  indirect  dependence  of  any  one  particular  d;(<), 
on  A*  is  small,  then  either  the  combined  effects  of  all  the  indirect  dependencies  will  be 
negligible  so  that 

A(6k)  ~  Ak, 

or  they  will  be  additive  or  multiplicative  such  that  the  difference  between  A (4)  and 
A*  is  large.  Define  a  resonance  as  the  condition  where 

A (Sk)  >  0  AND  Ak  «  0. 

Thus,  a  resonance  is  where  a  seemingly  optimal  decision,  based  on  an  agent’s 
local  information,  becomes  a  global  disaster  due  to  indirect  effects.  We  are  then  faced 
with  the  problem  that  either  an  agent  attempts  to  take  into  account  indirect  depen¬ 
dencies,  making  the  loss  due  to  evaluating  the  decision  rule  Le  large,  or  the  indirect 
dependencies  are  ignored,  risking  to  make  the  loss  due  to  decision  quality  degradation 
Ld  large. 

We  base  our  solution  to  this  problem  on  an  assumption  about  the  likelihood  of 
resonances:  this  assumption  is  that  of  all  the  possible  collective  decisions  made  by  all 
agents  at  a  given  point  in  time,  only  a  small  fraction  of  them  cause  resonances.  But 
the  probability  that  a  decision  will  be  selected  from  this  small  fraction  of  decisions  is 
high  enough,  and  the  consequent  payoffs  are  bad  enough,  that  care  must  be  taken  to 
avoid  selecting  such  decisions. 

Our  solution  then  is  to  use  the  payoff  A{Sk),  which  depends  solely  on  the  local 
decision  but  ignores  resonances,  and  to  build  into  the  decision  rule  a  method  for 
avoiding  resonances.  This  method  must  be  cheap  so  that  the  decision  rule  is  easy  to 
evaluate. 

4.7.2.  The  SPACE/TIME  Randomization  Technique 

We  now  present  a  technique,  which  we  call  the  SPACE/TIME  randomization, 
for  distributed  decisionmaking,  whose  goal  is  to  minimize  the  occurrence  of  reso¬ 
nances,  and  yet  achieve  coordination  with  minimal  communication.  W  ith  respect  to 
work-transfers,  an  agent  would  determine  which  decision,  given  state  information 
about  remote  agents  (along  with  the  associated  measures  of  uncertainty) ,  will  produce 
the  highest  payoff.  First,  in  defining  the  payoffs,  use  a  payoff  A(^)  which  depends 
solely  on  the  decision  6k.  Then,  let  the  decision  rule  produce  all  the  decisions  6p , 
whose  payoffs,  A(£p),  A(6?),...,  are  positive.  Included  also  is  the  null  decision  do 
(regardless  of  its  payoff),  which  indicates  no  transfer  of  work.  (Making  the  null  deci¬ 
sion  provides  an  agent  with  a  mechanism  for  delaying  the  transfer  of  work,  as  we  will 
see  shortly.)  Thus,  the  set  of  possible  decisions  is 

D$  —  {6p,6q,  .  .  .  , d0} 

Once  D#  is  determined,  one  decision  must  be  selected.  Since  agents  are  to  act 
rationally,  the  best  decision  to  select  is  the  one  which  will  produce  the  highest  payoff. 


72 


Principles  and  Techniques 


Chap.  4 


But  the  computed  payoffs  are  only  approximations  to  the  real  payoffs  m  that  they 
ignore  indirect  dependencies,  thus  increasing  the  probability  of  a  resonance.  A  reso¬ 
nance  will  occur  when  a  large  number  of  work-transfer  decisions  (made  independently 
of  each  other)  cause  work  to  go  to  an  unexpectedly  small  number  of  agents  (due  to 
their  ignored  dependencies:  they  all  select  the  "best"  agent,  which  may  be  the  same 
for  all  decisionmaking  agents).  To  minimize  this  problem,  a  decision  is  randomly 
selected  from  D 

The  randomization  is  accomplished  by  building  a  probability  distribution  over 
the  possible  decision  set  in  the  following  manner.  Let  pk  be  the  probability  that  deci¬ 
sion  6k  with  payoff  A ((5*),  is  selected,  for  each  6k  in  D6.  Also,  let  pdo  be  the  probabil¬ 
ity  that  the  null  decision  is  selected. 

Define 

Pk  =  (1  -  Pdo) - ^~T777’  for  aU  6k  e  °6'  (4.24) 

s  A(^) 

6j€D(- do 

and  define 

pd0  =  fd0(DS,  kt(t)).  (4.25) 

where  /do  is  a  function  which  produces  a  probability,  wdiich  will  be  discussed  shortly. 

Note  that  the  sum  of  (4.24)  and  (4.25)  indeed  comprise  a  complete  probability 
distribution: 

EP*  +  Pd0  =  !• 

The  probability  of  choosing  decision  6k  is  proportional  to  the  decision’s  payoff 
relative  to  the  other  decisions.  Thus,  the  "best"  agent  is  not  always  selected.  Rather, 
one  of  a  number  of  "good"  agents  is  selected,  with  the  idea  that  multiple  decisionmak¬ 
ers  spread  work  amongst  these  agents  in  accordance  with  their  degree  of  goodness. 

This  works  well,  provided  the  size  of  D $  is  not  small  and  the  number  of  agents 
which  are  likely  to  make  work-transfer  decisions  is  not  large.  What  if  this  is  not  the 
case?  In  this  situation,  we  would  like  only  a  fraction  of  agents  which  desire  to 
transfer  w'ork  to  actually  do  so.  The  fraction  should  be  large  enough  to  take  advan¬ 
tage  of  agents  which  can  accept  work,  but  small  enough  to  avoid  a  resonance.  Thus, 
there  should  be  a  mechanism  for  some  agents  to  abstain  from  transferring  work,  this  is 
why  the  null  decision  do  is  part  of  the  decision  set  Dg. 

Since  we  are  seeking  a  solution  where  agents  need  not  have  explicitly  to  coordi¬ 
nate  their  actions  to  decide  who  should  send  and  who  should  not,  wre  resort  again  to 
randomization.  An  agent  will  select  the  null  decision  with  probability  pdo.  It  is 
defined  above  as  the  function  /do,  whose  range  is  [0,1],  and  is  based  on  the  decision¬ 
making  agent’s  decision  set  D( 5  and  the  current  state  information  kl{t')  (.4,  is  the 
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decisionmaker).  Although  the  details  of  /do  are  application  dependent  and  therefore 
left  open,  we  offer  the  following  guidelines  for  its  construction.  It  should  decrease 
when  the  size  of  D$  increases,  since  a  larger  number  of  possible  decisions  imply  a 
higher  probability  of  the  spreading  of  work,  and  therefore  a  lower  probability  that  a 
resonance  will  occur.  On  the  other  hand,  it  should  increase  as  the  number  of  agents 
likely  to  transfer  work  increases,  as  may  be  inferred  from  since  this  implies  a 

higher  probability  that  a  resonance  will  occur.  A  more  complicated  /do  would  also 
account  for  the  distribution  of  payoffs  in  the  decision  set  (e.g.,  if  the  payoffs  are  all 
nearly  equal,  a  resonance  is  less  likely  than  if  one  decision  has  a  very  large  payoff,  and 
the  others  have  very  small  ones),  and  for  the  degree  of  uncertainty  of  state  informa¬ 
tion  (e.g.,  the  more  uncertain  is  the  state  information,  the  harder  it  is  to  predict  the 
probability  of  a  resonance,  and  consequently  a  conservative  approach  would  be  to 
increase  pdo).  An  example  expression  for  /do  and  how  it  is  derived  is  given  in 

Chapter  5. 

We  now  see  that,  to  minimize  the  occurrence  of  resonances,  a  decisionmaker 
selects  its  work-transfer  decision  by  randomizing  over  the  space  of  possible  recipient 
agents  when  it  selects  a  decision  6^.  The  decisionmaker  randomizes  over  time  when  it 
makes  the  null  decision,  in  a  sense  holding  onto  its  work,  possibly  to  transfer  it  at  a 
later  time.  The  technique  has  the  advantage  of  requiring  no  explicit  coordination; 
rather,  agents  implicitly  coordinate  by  inferring  what  they  can  about  the  global  state 
through  infrequent  communication,  and  by  acting  on  the  common  knowledge  that  all 
agents  will  SPACE/TIME  randomize  their  work-transfer  decisions. 

4.8.  Towards  a  Unifying  Framework  for  Intelligent  Agent  Design 

We  now  propose  the  organization  of  our  principles  of  decentralized  control  design 
into  a  single  unifying  framework  which  imposes  a  structure  on  the  reasoning  and 
decisionmaking  processes  of  each  agent.  In  particular,  the  framework  facilitates  the 
management  a  hierarchy  of  beliefs  and  actions.  We  call  this  a  framework  for  intelli¬ 
gent  agent  design. 

4.8.1.  Observe-Reason-Act  Structure 

The  operation  of  a  classical  control  system  is  generally  based  on  the  simple  and 
logical  observe-act  loop,  shown  in  Figure  4.13.  The  controller  observes  the  environ¬ 
ment  through  sensors,  and  then  may  issue  a  command  (i.e.,  take  action)  to  affect  the 
environment;  this  is  done  repeatedly.  Actions  are  based  directly  and  solely  on  obser¬ 
vation;  consequently,  this  assumes  that  the  environment  is  sufficiently  observable. 
The  notions  of  observability  and  controllability  are  the  central  concepts  of  classic  con¬ 
trol  system  design.  (For  a  discussion  of  these  and  related  concepts,  see  [Padu74].) 
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Figure  4.13.  Observe-act  control  structure. 


A  decentralized  control  system  with  a  large  number  of  controllers  (the  agents,  in 
our  terminology)  suffers  from  severe  observability  and  controllability  limitations. 
Consider  again  the  two  fundamental  problems  of  decentralized  control.  Problem  1  is 
that  no  agent  knows  with  complete  certainty  the  current  global  system  state,  and 
Problem  2  is  that  no  agent  knows  with  complete  certainty  the  current  actions  of  other 
agents.  It  is  interesting  to  note  that  Problem  1  has  to  do  directly  with  observability 
and  Problem  2  with  controllability;  this  confirms  our  intuition  that  they  are  funda¬ 
mental  problems. 

Returning  to  the  classical  controller  observe-act  loop,  we  see  that  this  simple 
framework  will  not  be  sufficient  for  our  purposes  since  both  global  observations  and 
global  actions  are  uncertain.  What  is  needed  is  the  ability  for  agents  to  reason  about 
observations  and  about  actions. 

It  then  becomes  reasonable  to  extend  the  classical  control  framework  to  an 
observe- reason-act  loop,  as  shown  in  Figure  4.14.  After  an  agent  has  observed  the 
system,  it  can  reason  about  the  implications  of  the  observations  concerning  the  system 
state,  and  then  take  action  based  on  this  reasoning.  This  reasoning  is  necessary  since, 
although  the  agent  may  not  be  able  to  observe  everything,  it  may  be  able  to 
hypothesize  (which  is  a  reasoning  process)  on  the  basis  of  the  observations  it  has  been 
able  to  make.  Indeed,  such  a  process  will  require  agents  to  have  powerful  processors 
and  large  memories,  requirements  wrhich  were  not  economically  satisfiable  in  the  past, 
but  are  today,  and  will  be  even  more  in  the  future. 
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Figure  4.14.  Observe-reason-act  control  structure. 


To  be  sure,  "reasoning"  does  not  necessarily  imply  a  great  cost  in  terms  of 
resources,  but  does  imply  a  costly,  time-consuming  process.  As  efficiency  is  certainly  a 
major  design  objective,  how  can  we  propose  to  perform  the  reasoning  step,  which  is  a 
potential  bottleneck,  in  the  middle  of  an  agent’s  control  loop?  The  key  here  is  to  real¬ 
ize  that  not  all  actions  need  to  be  based  on  "reasoned  observations.'  Therefore,  we  do 
not  want  a  hard  design  constraint  stating  that,  in  order  for  an  agent  to  take  actions, 
it  must  do  some  type  of  reasoning.  The  solution  is  to  have  a  framework  which  allows 
reasoning  to  take  place,  but  does  not  preclude  a  simple  observe-act  control  path  for 
fast  reactions. 

4.8.2.  Architectural  Framework 

The  framework  for  an  agent’s  control  architecture  must  capture  the  observe- 
reason-act  control  loop  as  discussed  above.  It  must  have  components  addressing  each 
of  the  three  (observe,  reason,  and  act)  processes.  Further,  the  interfaces  between 
these  components  must  reflect  their  functional  relationships.  Such  a  framework  is 
displayed  in  Figure  4.15. 
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Figure  4.15.  Agent  framework. 


Notice  that  the  framework  indeed  has  an  observe-reason-act  structure:  observa¬ 
tions  enter  the  Sensory  Input  Interpreter,  reasoning  may  then  take  place  between  the 
Hypothesis  Generator  and  the  Belief  Manager,  and  finally  actions  are  issued  by  the 
Action  Generator,  the  Experiment  Generator,  or  the  Reflex  Generator.  Before  going 
on,  the  word  "generator"  in  these  logical  blocks  is  used  in  the  sense  that  their  main 
function  is  to  produce  something  after  a  potentially  elaborate  processing  of  their 
inputs,  similar  to  the  way  a  code  generator  produces  machine  code  based  on  the  inter¬ 
mediate  results  of  the  processing  of  a  higher-level  program  description.  These  genera¬ 
tors  may  simply  produce  results  through  a  simple  table-lookup,  or  by  applying  a  set  of 
rules.  The  word  "generator"  is  not  meant  to  mislead  the  reader  into  thinking  that 
some  form  of  creativity,  with  all  the  nebulous  connotations  of  this  term,  is  required  in 
these  logical  blocks. 

4.8.3.  The  Observation/Interpretation  Structure 

The  blocks  called  in  Figure  4.15  the  Sensory  Input  Interpreter,  the  Hypothesis 
Generator,  and  the  Belief  Manager,  make  up  the  observation/interpretation  structure. 
A  useful  way  to  view  these  blocks  is  as  a  stratification  of  information  inside  an  agent. 
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Raw  input  signals  (i.e.,  observations)  enter  at  the  bottom,  and  through  an  interpretive 
process  are  transformed  into  information  which  rises  to  higher  levels,  representing  pro¬ 
gressively  higher  degrees  of  abstraction.  W  hat  form  this  information  takes  on,  and 
the  definition  of  the  appropriate  degrees  of  abstraction,  will  depend  on  the  applica¬ 
tion. 

Abstraction  is  useful  because  an  agent’s  observations  give  only  partial  informa¬ 
tion  and  indirect  clues  as  to  the  real  state  of  the  distributed  system.  Sometimes,  these 
observations  may  even  be  misleading  (e.g.,  the  received  state  information  from  a 
remote  agent  may  not  correspond  to  that  agent’s  current  state).  Yet,  this  information 
may  imply  a  set  of  possible  states,  which  constitute  a  single  abstract  state.  This  lim¬ 
its  what  real  states  the  agent  needs  to  consider.  The  agent  can  decide  to  get  more 
information  to  reduce  further  the  number  of  possible  real  states.  At  some  point,  the 
agent’s  decision  becomes  insensitive  to  the  distinction  between  these  possible  real 
states,  and  so  it  can  be  made  without  any  further  ado,  even  though  the  real  state  is 
not  known  with  complete  certainty.  Thus,  the  production  and  the  maintenance  of  this 
stratification  of  information  allow  actions  to  take  place  at  different  degrees  of  abstrac¬ 
tion;  actions  can  be  triggered  by  changes  occurring  at  the  different  levels  of  reasoning. 

Let  us  now  look  at  each  logical  block  in  the  framework  in  more  detail.  The  Sen¬ 
sory  Input  Interpreter  accepts  inputs  from  the  environment.  These  inputs  derive 
either  from  the  agent’s  private  source  of  work  ($,(<)),  or  from  communications  with 
other  agents  (z, ■(£)).  We  use  the  name  "sensory  input  interpreter"  to  denote  the  act  of 
sensing  followed  by  a  simple  act  of  interpretation.  Although  more  generally  the  term 
"sensing"  brings  to  mind  probes  sensing  physical  phenomena  such  as  light  or  tempera¬ 
ture  changes,  our  sensing  is  purely  information-based.  (There  is  nothing,  however, 
that  precludes  using  our  framework  for  agents  which  interact  more  directly  with  a 
physical  environment)  -  The  simple  interpretation  of  input  may  consist  of  updating 
counts  or  statistics,  detecting  threshold  crossings,  or  modifying  a  probability  that 
represents  a  measure  of  confidence  about  some  piece  of  state  information. 

Up  one  level,  the  Hypothesis  Generator  produces  hypotheses  (e.g.,  about  possible 
states).  The  reception  of  a  new  observation  from  the  Sensor}7  Input  Interpreter,  and 
the  interpretation  of  this  observation  in  light  of  current  beliefs,  will  cause  the  produc¬ 
tion  of  a  particular  set  of  hypotheses,  where  each  hypothesis  has  an  associated  proba¬ 
bility,  a  measure  indicating  the  chances  that  the  hypothesis  is  true.  These  hypotheses 
explain  the  new  observation  in  terms  of  its  implications  about  the  system  state;  in 
fact,  these  hypotheses  may  be  regarded  collectively  as  a  single  super-state  comprising 
all  the  system  states  it  implies  along  with  a  probability  distribution  function  defined 
over  these  possible  states.  The  probabilities  are  defined  by  the  past  behavior  of  the 
system  (e.g.,  by  the  frequency  counts  of  past  states).  Note,  however,  that  newT 
hypotheses  do  not  necessarily  imply  that  new  observations  have  been  received;  in  fact, 
the  lack  of  a  new  observation  (e.g.,  of  an  expected  message  from  a  remote  agent  which 
does  not  arrive)  may  cause  the  generation  of  new  hypotheses. 
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Hypotheses  go  through  a  process  of  acceptance  or  rejection.  This  process  is  real¬ 
ized  through  the  modification  of  probabilities  assigned  to  all  hypotheses.  As  a 
hypothesis  gains  more  support,  its  probability  is  raised.  Similarly,  as  a  hypothesis 
loses  support,  its  probability  is  lowered.  When  a  hypothesis’s  probability  rises  above 
a  high  threshold  (e.g.,  .8),  it  becomes  a  belief,  and  all  competing  hypotheses  are 
rejected.  A  number  of  experimental  systems  have  been  built  which  use  different 
methods  for  modifying  credibility  ratings  for  hypotheses  based  on  the  reception  of  new 
pieces  of  evidence,  such  as  MYCIN  [Shor75],  Prospector  [Duda76],  Distributed  Hear¬ 
say  II  [Less80],  AL/X  [Reit8l],  and  SPERIL  [IshiSl],  to  name  a  few. 

The  uppermost  level  of  Figure  4.15  contains  the  Belief  Manager.  In  it  reside 
hypotheses  which  have  been  accepted  over  competing  hypotheses  which  were  eventu- 
allv  rejected.  Beliefs  must  be  managed  in  the  sense  that  they  must  be  stored  in  a 
database,  and  they  must  be  mutually  consistent  (i.e.,  there  can  be  no  contradictory 
beliefs).  Thus,  every  time  a  new  belief  is  created,  the  Belief  Manager  must  verify  that 
it  does  not  contradict  existing  beliefs.  If  there  are  contradictory  beliefs,  they  must  be 
removed  from  the  database.  This  activity  is  commonly  called  truth  maintenance 
[Doyl79]. 

4.8.4.  Actions  at  Different  Levels  of  Abstraction 

The  framework  in  Figure  4.15  provides  for  different  types  of  actions  at  different 
levels  of  abstraction  that  an  agent  may  take,  based  on  observation  and  reasoning.  An 
action  at  a  particular  level  of  abstraction  is  triggered  by  a  change  in  information 
occurring  at  that  same  level  of  abstraction  in  the  observation/interpretation  structure. 

At  the  lowest  level  is  the  Reflex  Generator;  it  produces  actions  which  are  trig¬ 
gered  by  changes  occurring  in  the  Sensory  Input  Interpreter.  As  the  name  implies, 
reflexes  are  quick  responses  to  a  simple  first-level  interpretation  of  new  observations, 
such  as  the  detection  of  a  threshold  crossing.  These  actions  are  not  based  on  reason¬ 
ing,  thus  establishing  a  simple  and  fast  observe-act  control  path. 

At  the  next  higher  level  is  the  Experiment  Generator,  actions  which  are  triggered 
by  the  Hypothesis  Generator.  When  a  set  of  competing  hypotheses  are  generated,  it  is 
desirable  for  the  agent  to  determine  which  is  the  correct  one,  or,  at  least  to  reduce 
their  number.  The  agent  can  passively  wait  for  new  information  to  arrive,  or  it  can 
take  an  active  role  by  generating  a  simple  test  designed  to  rule  out  some  of  the 
hypotheses.  These  tests  are  preprogrammed  and  attached  to  their  corresponding  set 
of  competing  hypotheses  so  that,  wrhen  that  set  is  proposed,  the  test  is  invoked.  The 
test  may  cause  actions  at  remote  agents,  through  the  sending  of  messages.  This 
causes  new  observations  (or  the  lack  of  expected  observations),  providing  further 
information  for  modifying  the  probabilities  of  the  competing  hypotheses.  The  combi¬ 
nation  of  the  Hypothesis  and  Experiment  Generator  implement  what  is  commonly 
called  the  hypothesize- and-test  problem-solving  paradigm.  (See  [LessSO]  for  an  appli¬ 
cation  of  this  paradigm  to  the  problem  of  distributed  interpretation.) 
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At  the  highest  level  is  the  Action  Generator,  triggered  by  the  Beliefs  Manager. 
At  this  level,  calculated  actions  based  on  the  agent’s  reasoning  capabilities  take  place. 
These  are  high-level  heavy-weight  decisions,  in  the  sense  that  they  have  wide-ranging 
effects,  and  therefore  are  not  to  be  made  lightly.  From  the  point  in  time  particular 
inputs  are  received,  an  action  at  this  level  will  occur  after  a  relatively  much  longer 
time  than  a  reflex.  Although  reflexes  occur  on  a  frequent  basis,  and  their  effects  are 
quickly  perceived  and  short-lived,  the  effects  of  high-level  actions  may  not  fully  occur 
or  be  perceived  until  a  much  later  time,  but  are  expected  to  be  long-lasting.  These 
actions  may  be  viewed  as  implementing  global  goals. 

4.8.5.  How  the  Framework  Incorporates  Our  Principles 

As  the  reader  may  have  already  noticed,  our  principles  for  constructing  approxi¬ 
mate  solutions  for  decentralized  control  are  present  within  our  framework,  whose  pur¬ 
pose  is  to  unify  them. 

The  framework  encourages  a  knowledge-based  solution  through  the  use  of  rules 
that  transform  information  into  more  abstract  information  (i.e.,  control  flowing  up  the 
left-hand  side  of  the  framework),  or  rules  by  which  new  information  causes  different 
types  of  actions  (i.e.,  control  flowing  down  the  right-hand  side  of  the  framework). 

The  application  of  knowledge  abstraction  is  present  in  the 
observation/interpretation  structure.  Information  is  received  by  the  Sensory  Input 
Interpreter,  causing  higher-level  abstractions  to  be  generated  by  the  Hypothesis  Gen¬ 
erator  and  the  Beliefs  Manager. 

Quantification  of  uncertainty  initially  takes  place  at  the  Sensory7  Input  Inter¬ 
preter,  where  an  observation  is  given  an  initial  measure  of  confidence.  This  measure 
may  depend  on  a  number  of  factors,  such  as  the  transmission  time  between  sender 
and  receiver,  and  the  presence  of  noisy  channels.  The  generation  of  hypotheses  also 
involves  uncertainty  quantification.  For  example,  in  computing  a  conditional 
expected  utility  (i.e.,  a  belief  about  the  desirability  of  a  remote  agent),  a  number  of 
possible  states  (i.e.,  hypotheses)  are  considered  with  appropriate  probabilities.  In 
other  cases,  it  is  desirable  to  consider  the  hypotheses  more  carefully  by  generating 
experiments  to  acquire  more  evidence  before  arriving  at  a  belief.  Again,  this  involves 
modifying  a  measure  of  confidence.  Note  that,  in  all  cases,  the  principle  of  integrating 
information  aging  in  decisionmaking  is  also  involved,  as  the  probabilities  of 
hypotheses  will  generally  change  over  time  due  to  the  system  s  dynamic  nature. 

The  use  of  directional  heuristics  is  made  manifest  in  the  reason-act  part  of  the 
framework.  In  particular,  the  separation  of  actions  into  three  different  types  is  a 
direct  consequence  of  the  principle  of  using  directional  heuristics.  Consequently , 
different  actions  can  be  triggered  by  different  levels  of  reasoning  depending  on  their 
expected  tendencies  (wrhich  may  be  positive  at  some  levels  but  negative  at  others)  to 
increase  utility. 
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The  principle  of  frugal  communication  is  present  in  the  framework’s  emphasis  on 
reasoning  about  observations,  and  creating  hypotheses  and  beliefs,  before  acting.  The 
idea  is  that  hypotheses,  and,  to  a  greater  degree,  beliefs,  are  valid  for  longer  periods  of 
time  than  observations;  consequently,  what  can  be  inferred  by  hypotheses  and  beliefs 
replaces  the  need  for  extensive  communication.  When  information  becomes  too  unc¬ 
ertain,  causing  the  generation  of  a  large  number  of  low-confidence  hypotheses  by  the 
Hypothesis  Generator,  it  is  the  Experiment  Generator  which  causes  a  request  for  a 
state  update  from  a  remote  agent.  On  the  other  hand,  when  a  belief  about  the  local 
state  changes  (which  may  cause  a  number  of  hypotheses  indicating  that  other  agents 
have  incorrect  information),  either  experiments  can  be  generated  (e.g.,  to  inquire 
whether  remote  agents  do  indeed  have  incorrect  information),  or  a  high-level  action 
can  be  generated  (e.g.,  broadcasting  the  new  information  because  of  the  assumption 
that  remote  agents  have  incorrect  information). 

Finally,  the  principle  of  SPACE/TIME  randomization  is  used  to  raise  the 
efficiency  of  hypothesis  generation  and  belief  management.  It  limits  the  number  of 
hypotheses  which  must  be  considered  by  minimizing  the  possibility  of  mutually 
conflicting  decisions.  Since  this  also  limits  the  possibility  for  what  an  agent  may  con¬ 
sider  a  contradiction  (e.g.,  a  locally  optimal  decision  creates  a  global  disaster)  the  rate 
at  which  truth  maintenance  of  beliefs  takes  place  is  reduced. 

4.9.  Summary  of  Principles 

In  this  chapter,  we  presented  a  set  of  seven  design  principles  for  constructing 
approximate  solutions  to  decentralized  control  problems.  The  principles  are  as  fol¬ 
lows: 

•  Adopt  a  knowledge-based  solution:  incorporate  all  special-case  knowledge 
about  the  problem  as  an  integral  part  of  the  decisionmaking  process. 

•  Apply  knowledge  abstraction:  summarize  information  into  a  form  which  can 
be  utilized  and  communicated  more  efficiently. 

•  Quantify  uncertainty:  explicitly  account  for  information  uncertainty  in 
decisionmaking. 

•  Use  directional  heuristics:  select  decisions  based  on  their  tendencies  to 
increase  utility. 

•  Integrate  information  aging  in  decisionmaking:  condition  expected  state 
utility  on  the  age  of  information. 

•  Communicate  frugally:  communicate  only  when  the  cost  of  the  consequences 
of  using  out-of-date  information  in  decisionmaking  exceeds  cost  of  communica¬ 
tion  overhead. 

•  Avoid  resonances  using  SPACE/TIMiE  randomization:  randomize  over 
the  space  of  good  decisions  and  over  the  time  during  w'hich  these  decisions  can  be 
made  to  avoid  mutually  conflicting  decisions  between  agents. 
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We  also  presented  a  framework  for  the  design  of  intelligent  agents,  which  is 
based  on  our  principles. 


CHAPTER  5 


DECENTRALIZED  LOAD  BALANCING 


In  this  chapter,  we  consider  the  general  problem  of  balancing  the  load  over  multi¬ 
ple  computers.  Although  there  are  many  formulations  of  the  load-balancing  problem 
(see  [Casa88]  for  a  survey),  we  will  limit  ourselves  to  ones  where  control  is  decentral¬ 
ized;  all  computers  take  part  in  making  load-balancing  decisions.  This  problem  is 
used  as  a  vehicle  for  demonstrating  the  application  of  the  principles  and  techniques 
outlined  in  the  previous  chapter.  We  will  use  the  formalism  developed  in  Chapter  3 
to  describe  the  problem  in  more  precise  terms.  The  reader  may  wish  to  refer  to  the 
model  summary  in  Section  3.2  during  the  following  discussion.  This  chapter  is  limited 
to  a  discussion  of  applying  our  methods  to  the  general  problem  of  load  balancing.  In 
the  next  chapter,  we  will  present  results  of  experiments  for  a  particular  load-balancing 
environment. 

The  chapter  is  organized  as  follows.  In  Section  5.1,  we  give  a  brief  description  of  the 
load  balancing  problem  using  the  formalism  developed  in  Chapter  3.  In  Section  5.2, 
we  discuss  the  design  of  an  abstract  state  space  appropriate  for  load  balancing  and, 
how  the  decision  space  affects  the  design.  In  Section  5.3,  we  present  types  of  domain 
specific  knowledge  which  are  useful.  In  Section  5.4,  we  discuss  the  design  of  state 
transition  models.  In  Section  5.5,  we  present  measures  for  comparing  the  desirabilities 
of  agents.  Finally,  in  Section  5.6,  we  discuss  the  rational  decisionmaking  process  of  a 
load-balancing  agent. 

5.1.  Formal  Description 

An  agent  A,  6  A  is  a  computer  system  supporting  the  execution  of  jobs.  A  job  is 
some  finite  amount  of  work  we  W.  For  each  job  it  receives,  an  agent  A,  makes  a 
load- balancing  decision  d,  €  D,-  to  execute  the  job  itself,  or  to  transfer  it  in  a  network 
to  some  other  agent  Aj.  The  global  objective  is  to  minimize  the  average  time  a  job  is 
delayed  during  its  execution  due  to  its  contention  with  other  jobs  seeking  execution 
and  due  to  its  possible  network  transmission  from  one  agent  to  another  if  offloaded. 
We  will  make  this  objective  more  precise  after  defining  the  abstract  state  spaces  of 
agents. 

5.2.  Abstract  State  Space 

We  mentioned  in  Section  4.2  that,  in  general,  an  agent’s  low-level  state  space  X, 
is  never  used  directly  in  decisionmaking.  Rather,  an  abstract  state  space,  v»here  each 
state  represents  a  (possibly  large)  number  of  lowT-level  states,  is  needed.  The  construc¬ 
tion  of  this  abstract  state  space  is  driven  by  the  needs  of  decisionmaking.  It  must 
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discriminate  the  possible  situations  which  could  trigger  any  of  the  agent’s  possible 
decisions.  Thus,  let  us  identify  these  possible  decisions. 

We  distinguish  between  two  types  of  decisions  an  agent  will  make:  problem  deci¬ 
sions  and  communication  decisions.  Problem  decisions  are  those  which  have  to  do 
specifically  with  the  particular  problem  at  hand,  in  this  case,  load  balancing.  The 
decision  to  execute  a  job  locally,  or  to  transfer  it  to  another  agent,  is  a  problem  deci¬ 
sion.  An  agent’s  decision  rule  7,  pertains  to  problem  decisions. 

Communication  decisions  are  those  having  to  do  with  maintaining  global  state 
information  for  each  agent.  The  decision  to  send  an  update  of  the  local  state  to 
remote  agents,  and  the  decision  to  inquire  about  a  remote  agent  s  state  are  communi¬ 
cation  decisions.  These  decisions  are  common  for  agents  taking  part  in  any  type  of 
decentralized  control  (although  parameters  such  as  the  update  period  will  be  problem 
specific).  Thus,  they  are  dealt  with  separately,  using  a  mechanism  independent  of  the 
problem  decision  rule  7,. 

Given  the  stated  load-balancing  decisions,  an  agent  must  be  able  to  discriminate 
among  the  following  situations.  It  must  recognize  whether  it  is  overloaded  or  not,  to 
decide  whether  it  should  execute  a  job  locally.  It  must  recognize  whether  remote 
agents  are  underloaded  or  not,  to  decide  where  to  transfer  a  job  if  it  is  overloaded. 
Thus,  we  see  there  are  at  least  three  necessary  abstract  states:  overloaded,  under¬ 
loaded,  or  neither,  which  will  be  called  normal. 

How  are  these  abstract  states  recognized?  Ultimately,  an  agent  must  be  able  to 
reflect  on  some  aspect  of  its  low-level  state,  and  quickly  determine  the  correct  abstract 
state.  Such  a  readily  accessible  portion  of  the  low-level  state  was  defined  as  an  indica¬ 
tor,  I(x ,).  A  good  indicator  of  load  for  load  balancing  is  the  number  of  jobs  ready  for 
execution  (see  [Ferr86]  for  an  investigation  of  load  indices  for  load  balancing).  Thus, 
we  will  require  agents  to  maintain  in  their  primary  memory ,  for  quick  access,  the 
current  value  of  this  number,  which  we  call  the  instantaneous  number  of  ready  jobs, 
denoted  by  Rt. 

The  instantaneous  number  of  ready  jobs  was  chosen  as  an  indicator  for  a  number 
of  important  reasons.  First,  since  it  is  a  single  quantity,  it  allows  differentiation 
between  the  necessary  abstract  states  using  the  following  simple  rules: 

if  <  Ru  then  underloaded 
else  if  Rt  >  R0  then  overloaded 
else  normal 

Ru  and  R0  are  experimentally  determined  thresholds,  and  R,  is  a  time-average  over 
recent  past  values  of  R,  (to  be  discussed  shortly).  Second,  it  has  relationships  to  com¬ 
munication  overhead  and  expected  job  delay,  two  quantities  whose  importance  v.ill 
become  apparent  below.  These  relationships  are  easily  determined,  and  will  be  used 
for  predictive  purposes.  Third,  a  simple  stochastic  model  describing  how  R,  changes 
over  time,  also  to  be  used  for  prediction,  can  be  conveniently  realized.  Fourth,  and 
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most  importantly,  there  is  strong  theoretical  support  under  assumptions  that  are  not 
unreasonable  that  R,  is  a  good  indicator  of  load  [Ferr86]  (which  is  also  supported  by 
experimental  evidence  [Ferr88]). 

As  presented  so  far,  the  abstract  state  y,-(f)  could  simply  be  defined  as  the  value 
of  /(x,(f)),  which  equals  f?,(f);  i.e.,  the  current  state  could  just  be  the  current  instan¬ 
taneous  number  of  ready  jobs.  This  would  not  work  well  though  in  a  system  where  Rt 
changes  rapidl}\  A  common  scenario  is  that  i?t  fluctuates  rapidly  about  a  more  slowly 
changing  fundamental  variation.  (We  have  observed  this  in  our  experiments 
described  in  Chapter  6.  Similar  observations  are  reported  in  [Ferr88]  and  [Zhou87].) 
It  is  this  fundamental  variation  which  is  of  most  interest. 

One  wray  of  extracting  this  fundamental  variation  is  to  consider  the  sequence 
Ri{t),Ri{t  +  Tt),Ri{t+2Tg),  •  •  •  ,  where  Ts  is  the  sampling  period  for  J(xt).  Note  that 
I[Xi)  can  change  at  the  rate  at  which  x,,  the  low-level  state,  changes.  Even  if  it  were 
possible,  monitoring  every  change  in  /(x,)  would  be  unnecessary,  hence  the  reason  for 
sampling.  Ts  should  be  small  enough  that  the  Nyquist  criterion  is  satisfied,  i.e.,  that 
1/TS  is  greater  than  twice  the  base  frequency  of  R,.  Standard  Fourier  analysis  tech¬ 
niques  can  be  used  to  determine  this  frequency.  (Note  that  a  very  high  base  fre¬ 
quency  would  indicate  a  poor  choice  of  indicator.) 

If  this  sequence  is  considered  as  a  time  series  i?,(n),  E,(n+l),  R{(n+  2), 
where  n  is  the  index  of  the  sampling  period,  filtering  techniques  can  be  applied  to 
remove  very  high  frequency  components,  leaving  only  the  fundamental  components. 
In  Section  4.5.2,  we  discussed  the  use  of  a  moving  average  and  autoregression  tech¬ 
niques  for  filtering.  We  will  use  the  following  simple  autoregressive  model  (based  on 

(4-4)): 

Rt(n)  =  u-R,{n-l)  +  (l-w)-fl,-(n),  (5-1) 

where  u  is  a  constant  between  zero  and  one.  This  has  the  effect  of  averaging  samples 
of  the  time  series  of  the  instantaneous  number  of  ready  jobs  over  the  recent  past. 
How  much  weight  is  given  to  the  recent  past  is  determined  by  the  value  of  u. 

Define  the  number  of  ready  jobs  B,(n),  for  time  period  n  as  the  closest  integer  to 
Rt{n), 

Bt{n)  =  ROUND{Ri{n))  (5-2) 

Finally,  the  value  of  agent  A,’s  local  state  at  time  t  is  simply  the  number  of  ready 
jobs  during  time  period  n  which  contains  t, 

y,-(t)  =  B,(n),  nTs  ^  t  <  n(Ts  + 1).  (5.3) 

Therefore,  the  abstract  state  space  Y,  is  the  set  of  non-negative  integers,  up  to  some 
maximum  Bmax.  The  size  of  Y,  is  limited  because  of  the  finiteness  of  the  memory  of 
real  machines.  If  agents  are  heterogeneous,  they  can  have  different  values  for  Bmax. 
We  will  assume  a  homogeneous  set  of  agents  to  simplify  our  discussion;  therefore,  they 
will  all  have  the  same  value  for  Bmax.  Thus, 
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Y,  =  {0,  1,  2,...,Bmax},  for  all  i. 

jTqj*  purposes  of  exposition,  we  have  so  fox  ignored,  the  distinction  ms.de  in  Sec¬ 
tion  4.5.2  between  the  state  space  Y,  and  the  measure  M(y ,),  yt€Y„  defined  over  the 
same  state  space.  We  now’  make  that  distinction  precise.  The  current  abstract  state 
y,(t)  can  take  on,  as  its  value,  any  of  the  state  identifiers  in  Y,.  When  necessary,  we 
will  use  the  symbol  6  to  refer  to  one  of  these  state  identifiers.  Each  of  these  states  also 
has  a  numerical  value,  defined  by  the  measure  M{6).  M  simply  maps  state  6  to  the 
integer  value  it  represents.  When  necessary,  we  will  use  the  symbol  /?  to  refer  to  one 
of  these  integers. 

Summarizing,  the  current  abstract  state  y,(f)  is  6 ,  where  M(0)  =  (3  is  agent  A,  s 
number  of  ready  jobs,  as  defined  by  (5.1)  and  (5.2).  The  abstract  state  space  Y,  is 
the  set  of  symbols  representing  all  the  possible  values  of  B,,  which  are  the  non¬ 
negative  integers  up  to  some  maximum  value  Bmax. 

Since  the  symbols  of  the  state  space  and  the  values  of  the  measure  defined  over 
that  space  have  a  one-to-one  correspondence,  there  is  no  need  to  burden  the  reading 
of  this  text  by  maintaining  this  distinction,  so  long  as  the  reader  understands  that  it 
does  exist.  In  the  few  cases  where  the  distinction  must  be  made,  the  reader  will  be 
alerted. 

5.3.  Domain- specific  Knowledge 

To  conquer  uncertainty,  agents  are  given  knowledge  specific  to  the  domain  of 
load  dynamics.  This  knowledge  is  gathered  by  observing  variables  of  interest,  such  as 
B,-,  and  noting  specifically  how  they  change  over  time.  Most  of  this  knowledge  can  be 
acquired  offline,  applying  time-series  analysis  to  past  histories  of  these  variables. 

Knowledge  can  take  on  many  forms,  which  depend  on  the  problem.  Our  analyses 
will  produce  some  knowledge  in  the  form  of  a  number  of  models  of  load  dynamics. 
An  agent  can  use  these  models  to  predict  what  state  the  system  is  in,  given  past  state 
information.  Since  it  is  desirable  that  the  reliability  of  the  predictions  be  quantifiable, 
probabilistic  models  are  used.  These  models  are  convenient  since  they  encode  very 
economically  a  great  deal  of  knowledge  about  agent  activity. 

Unfortunately,  not  all  knowledge  can  be  expressed  succinctly  in  terms  of  simple 
models.  For  example,  when  an  agent  receives  a  job,  it  must  make  a  prediction  about 
how  much  processing  time  the  job  will  need.  This  can  be  done  by  consulting  a  table, 
which  has  recorded  the  amount  of  processing  time  that  similar  jobs  needed  in  the 
past.  This  table  is  really  a  set  of  special-case  rules,  each  one  stating:  if  this  is  a  job  of 
type  i,  then  it  will  need  y  units  of  processing  time.  Consequently,  information  which 
is  unstructured,  and  cannot  be  generalized  as  a  parametric  model,  can  always  be 
expressed  as  a  set  of  special-case  rules. 

Finally,  we  note  that  knowledge  can  be  represented  by  rules  which  indicate  which 
models  to  use.  Thus,  from  a  large  set  of  models,  where  each  model  applies  to  a 
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limited  situation,  a  selection  can  be  made  based  on  rules,  which  test  which  situation 
applies.  Even  this  form  of  knowledge  is  used  in  our  load-balancing  study. 

5.4.  Designing  State-transition  Models 

An  agent’s  abstract  state  space  is  defined  to  be  the  set  of  non-negative  integers, 
up  to  some  maximum  value,  and  each  state  represents  a  possible  value  of  the  number 
of  ready  jobs  in  the  agent.  Given  the  value  of  a  past  state,  an  agent  can  predict  a 
future  state  using  a  state-transition  model.  Our  model  will  be  probabilistic  rather 
than  deterministic;  that  is,  it  will  provide  for  a  set  of  possible  future  states,  with 
assigned  probabilities.  These  probabilities  quantify  the  reliability  of  the  model’s  pred¬ 
ictions. 

We  will  assume  that  state  changes  occur  at  discrete  points  in  time,  with  some 
period  T  as  the  discretization  period.  (This  period  T  need  not  necessarily  be  the  same 
as  the  sampling  period  Ts,  although  it  is  convenient  when  T  is  a  multiple  of  Ts.  To 
simplify  our  discussion,  we  will  assume  that  T  —  Ts.) 

When  we  speak  of  agent  A,  going  through  a  sequence  of  states  0O,  6 1,  starting  at 

time  t,  we  mean  that  y,(f)  =  0Q,  Vi(t  +  T)  =  <?„  y,*(t  +2T)  =  02-  In  Section  4.5.2,  we 
discussed  the  relationship  between  the  abstract  state  space  and  the  period  T,  and  saw 
that  the  larger  this  period  T,  the  smaller  the  rate  of  state  change.  A  large  T  is  desir¬ 
able  for  many  reasons,  most  importantly  that  uncertainty  grows  more  slowly  in  time  if 
the  rate  of  state  change  decreases.  The  slower  the  increase  in  uncertainty,  the  greater 
the  confidence  in  inferences  about  the  current  state  based  on  past  state  information, 
and  the  lesser  the  need  to  update  state  information  through  communication. 

A  particularly  useful  model  is  one  which  has  the  Markovian  property:  the  proba¬ 
bility  distribution  of  the  next  state  y,((n +1)  T),  given  the  past  states  y,(0)  =  60, 
yt(T)  =  y,((n— 1)  T)  -  8„-i,  and  the  current  state  yt{nT)  =  0n,  depends  only  on 

the  value  of  the  current  state,  not  on  those  of  past  states.  A  first-order  Markov  model 
can  be  conveniently  represented  as  a  matrix  of  one-step  transition  probabilities  P, 
with  the  n-step  transition  matrix  given  by  Pn.  This  allows  one  to  derive  a  set  of  pos¬ 
sible  states,  along  with  their  probabilities,  given  any  past  state,  which  is  exactly  what 
our  agents  must  be  able  to  do. 

5.4.1.  Load  Levels  and  Degree  of  Variability 

Load  balancing  deals  with  the  natural  fluctuations  of  the  loads,  and  their  conse¬ 
quent  imbalances  among  different  agents,  by  redistributing  them.  These  fluctuations 
are  caused  by  the  unpredictable  increases  and  decreases  in  the  rate  of  job  arrivals,  and 
the  unpredictable  amount  of  work  each  job  brings  in.  We  make  the  following 
assumption,  based  on  empirical  evidence  from  our  experiments  described  in  Chapter 
6,  about  the  pattern  of  load  fluctuations:  the  load  does  not  change  in  a  continuous 
fashion;  rather,  it  remains  constant,  at  some  load  level  for  an  unpredictable  interval  of 
time,  after  which  it  changes  to  a  new  load  level,  where  it  then  remains  constant  for 
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another  interval  of  time  until  the  next  change.  (See  Figure  6.6  in  Chapter  6  for  an 
illustration  of  this  behavior.)  The  lengths  of  time  over  which  these  changes  take  place 
are  very  short  relative  to  the  intervals  of  time  the  load  remains  constant.  This  is  a 
good  example  of  domain- specific  knowledge,  of  which  our  agents  will  take  advantage. 

Note  that  the  load  level  and  the  fundamental  component  of  the  variation  in  the 
number  of  ready  jobs  as  defined  in  Section  5.2  are  not  the  same.  The  purpose  of  iden¬ 
tifying  the  fundamental  component  was  to  remove  high-frequency  components  from 
the  time  series  of  instantaneous  number  of  ready  jobs  by  taking  a  short-term  average. 
As  for  the  notion  of  load  level,  it  embodies  the  idea  of  plateaus  within  the  long-term 
fluctuation  of  load.  Their  relationship  manifests  itself  as  the  relatively  continuous 
movement  of  the  fundamental  component  of  variation  about  a  temporarily  fixed  load 
level.  (See  Figure  6.6  in  Chapter  6  for  an  illustration  of  the  relationship  between  the 
fundamental  component  and  the  load  level.) 

To  make  this  more  concrete,  we  now  pose  it  in  terms  of  how  an  agent  s  abstract 
state  changes.  The  time  behavior  of  load  B,  will  be  characterized  by  two  quantities,  a 
long-term  average  of  B„  denoted  by  I,;  and  the  average  difference  between  B,  and  X„ 
denoted  by  F,.  The  idea  is  that  X,  represents  the  load  level,  which  should  remain 
constant  until  a  significant  change  in  load  takes  place,  and  F,  is  a  measure  of  the 
degree  of  variability  of  B,  about  L,. 

Let  MAi(n)  be  a  moving  average  (see  (4.3))  of  the  past  Ni  loads, 

A 'L 

MAL{n)  =  'E“kBi{n-k)  (5.4) 

*= o 

and  let  MAv{n)  be  a  moving  average  of  absolute  differences  between  B,  and  I,  for 
the  past  Ny  loads, 

Ny 

MA  y(n)  =  Bfin-k)  -  Lfin-k)  |  .  (5.5) 

*=  o 

The  symbols  u0,  wl5...,  u0',  u>i  represent  weights  of  constant  value,  which  are  to 
be  determined  by  design  and  experimentation. 

Let  Hi  be  the  minimum  significant  change  in  B,-,  according  to  some  definition  of 
significance.  Similarly,  let  Hy  be  the  minimum  significant  change  in  the  variation  of 
Bt  about  the  load  level.  Finding  good  values  for  Hi  and  Hy  will  depend  on  experi¬ 
mentation.  Good  values  for  Hi  and  Hy  will  cause  X,  and  F,  to  change  slowly  (e.g., 
with  a  period  much  larger  than  period  T),  and  they  will  both  represent  long-term 
averages  about  which  B,-(n)  and  B,(n)  —  X,(ri),  respectively,  fluctuate. 

We  approximate  X,  by  rounding  the  value  of  MAi(n)  to  the  nearest  multiple  of 
Hl.  To  avoid  fluctuations  of  X,  when  MAL{n )  varies  closely  about  the  midpoint 
between  multiples  of  Hi ,  hysteresis  is  to  be  added.  We  thus  define  X,  as  a  function  of 
n  as  follows: 
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Lt(n)  =| 


R 0 UND ( MA l ( n ) ,  HL)  if  MAL{n)  >  Lt{n- 1) 
ROUND{MAL{n),  HL)  if  MAL{n)  <  Lt{n- 1) 


L,(n-1) 


otherwise 


Hl/2  +  h 
HL/2-h 


(5.6) 


To  compute  F,,  recall  that  it  represents  a  measure  of  B,’s  variation  about  Lt. 
When  Lx  changes  (i.e.,  when  X,(n)  Lx(n  —  1)),  a  new  value  for  V ,  must  be  esta¬ 
blished,  regardless  of  its  previous  value.  Let  Nc{Lt{n))  be  the  number  of  time  periods 
during  which  X,  has  not  changed  (i.e.,  X,(n)  ^  X,(n-Af(X,(n))-l)  but 

X,(n)  =  Li{n-k)  for  0^k<Nc{Lt{n))).  Thus,  in  computing  the  moving  average 
MAy(n ),  Ny  is  the  minimum  of  Nl,  the  number  of  past  states  used  in  the  MAi  mov¬ 
ing  average  computation,  and  Arf(X,(n)),  the  amount  of  time  X,  has  remained  con¬ 
stant. 


Consequently,  we  will  define  F,(n)  as  follows: 

ROUND{MAv{n),  Hv)  if  MAy{n)  >  V.-(n-l)  +  Hv/2  +  h 
F,(n)  ROUND{MAv{n),  Hv)  if  MAv{n)  <  F,(n-l)  -  Hv/2-h 
F,(n  —  l)  otherwise 


(5.7; 


5.4.2.  The  Agent  State-transition  Model 

With  this  characterization  of  the  load  in  terms  of  a  load  level  and  a  degree  of 
variability  in  mind,  we  can  now  design  an  agent’s  state  transition  model.  The  model 
should  allow  an  agent  A,  to  predict  the  possible  states  of  another  agent  Aj ,  based  on 
A,’s  most  recent  reception  of  information  about  Ay. 

Recall  from  Section  3.2  that  kjt{t)  represents  A,’s  current  state  information  about 
Ay.  Rather  than  sending  each  other  the  value  of  their  number  of  ready  jobs  Bt(u), 
our  agents  will  exchange  their  load  levels  and  degrees  of  variability,  i.e.,  X,(n)  and 
Vx(n).  This  information  is  in  a  sense  more  valuable  than  simply  the  value  of  R,(n), 
as  that  value  is  true  for  a  single  point  in  time,  but  may  change  very  quickly  (within 
one  period  T).  The  load  level  and  the  degree  of  variability  change  much  more  slowly, 
and  therefore  do  not  need  to  be  communicated  as  often.  Consequently,  the  prediction 
of  the  current  state  based  on  the  load  level  and  on  the  degree  of  variability  is  likely  to 
be  more  effective  than  if  it  were  based  simply  on  information  about  the  most  recent 
state. 

Thus,  we  define 

—  ( Lj(n  —  aj{ ),  Vj(n— ay,)), 

where  t  =  nT,  ajt  =  a^T,  and  t  -  ay,  is  the  time  Ay  recorded  this  information  and 
subsequently  sent  it  to  A,-.  Thus,  ay,  is  the  age  of  this  information  in  units  of  time, 
and  ay,  is  the  same  age  in  terms  of  time  periods. 
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The  agent’s  state-transition  model  should  say,  given  K;t{t),  what  possible  states 
A:  is  currently  in,  and  what  the  respective  probabilities  are.  This  model  should  have 
the  following  properties: 

(1)  it  should  be  Markovian; 

(2)  given  that  L:{n)  =  A  has  not  changed  in  the  interval  of  time  periods  [n-ay,-,  n], 
the  probability  that  Bj(n)  is  greater  than  A  should  equal  the  probability  that 
Bj(n)  is  less  than  A.  More  generally,  we  would  like  E(Bj(n)  |  I;(n-a;i)  =  A)  = 

A; 

(3)  the  variance  of  (By(n)  |  T,(n-ay,),  should  grow  with  increasing  F,(n),  and  with 
increasing  ay,  . 

The  actual  state  transition  model  is  given  by 

p{B:{n)  =  (3  |  Ljin-k)  =  A,  l'y(n-fc)  =  v)  =  [Pj]A/»,  (5-8) 

where  [P*]a/3  1S  the  element  in  row  A  and  column  /?  of  the  k  —  step  state  transition  pro¬ 
bability  matrix  Pkv,  P v  is  the  one-step  transition  matrix  of  size  Bmax  x  Bmax,  defined 
by 


Pv  - 


2  2 

(1-pJ 


2 

0 


Pv 

{l-Pv) 

2 

0 


0 

(1-Pv) 

2 

Pv 

(1-hv) 


0 

0 

(1-Pv) 

2 

Pv 


0 

0 


0 

0 

0 

0 


0  0  0 

0  0  0 


{1-Pv) 

0  •  Pv  - o - 


0 


2  2 


(5.9) 


This  model  has  many  good  qualities:  it  is  simple;  it  has  the  properties  described 
above;  it  can  be  stored  efficiently;  and  it  can  be  evaluated  efficiently  (or  pre-evaluated 
and  stored  as  a  table,  for  quick  lookup  afterwards).  The  parameter  pv  is  the  probabil¬ 
ity  that  Aj  will  remain  in  the  same  state  after  one  transition  (except  for  states  0  and 
B  max,  for  which  the  probability  of  remaining  in  these  states  after  one  transition  is 
(1  +  pv)/2).  It  is  a  decreasing  function  of  VJ  (hence  the  subscript  u), 
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=  f(v, ),-£-<  o  (5.io) 

Thus,  the  greater  the  degree  of  variability,  the  greater  the  probability  of  moving 
to  a  higher  or  lower  state,  and  therefore  the  lesser  the  probability  of  remaining  in  the 
same  state.  The  exact  relationship  between  pv  and  Vj  is  determined  experimental!) . 

We  call  this  model  the  steady-state  response  model,  in  that  it  bases  its  prediction 
on  the  fact  that  the  remote  agent’s  load  is  assumed  to  be  at  the  same  load  level  since 
the  reception  of  the  load  level  information.  Thus,  the  agent  is  in  a  steady  state:  to 
remain  at  the  same  load  level,  the  load,  determined  by  the  arrival  rate  of  jobs  from  its 
private  source  and  from  agents  transferring  jobs,  and  the  mix  of  job  types,  has  essen¬ 
tially  remained  the  same. 

An  agent  also  needs  a  model  for  predicting  what  would  happen  if  it  offloaded  a 
job  to  a  particular  agent.  Given  such  a  prediction,  it  can  calculate  whether  utility 
will  go  up  or  down,  and  thus  make  a  rational  decision.  For  load  balancing,  our  agents 
use  the  following  simple  model  (so  simple  it  can  be  stated  as  a  single  rule). 

if  a  job  is  offloaded  to  Aj,  L,  will  increase  by  1. 

We  call  this  model  the  transient  response  model  in  that  it  is  used  to  predict  a 
future  state  based  on  the  expectation  that  the  remote  agent’s  load  level  will  change. 
This  is  in  contrast  to  the  steady-state  response  model,  where  the  load  level  remains 
constant.  Note  that  there  is  an  implicit  assumption  that  an  offloaded  job  will  have  a 
significant  effect  on  the  receiving  agent’s  load  level,  specifically,  that  the  load  level  will 
go  up  by  one.  This  implies  that,  in  general,  the  types  of  jobs  that  are  offloaded  are 
those  that  will  execute  for  a  relatively  long  time,  long  enough  to  affect  the  long-term 
average  of  B,. 

5.5.  Measures  for  Comparing  the  Desirabilities  of  Agents 

Load-balancing  decisions  require  an  agent,  that  wrants  to  offload  a  job,  to  be  able 
to  tell  which  remote  agent,  if  any,  is  best  for  receiving  the  job.  In  general,  the  remote 
agent  with  the  "best"  state,  i.e.,  the  state  representing  the  presence  of  the  smallest 
number  of  ready  jobs,  is  not  necessarily  the  one  to  which  jobs  should  be  offloaded. 
The  main  reason,  illustrated  in  Section  4.5.3,  is  due  to  the  uncertainty  of  information 
about  these  remote  agents,  and  the  nonlinear  relationship  between  the  measure  of  an 
agent’s  state  and  its  utility.  This  results  in  situations  where,  for  example,  a  remote 
agent  with  a  reported  load  level  of  4,  which  was  reported  1  second  ago,  may  be  more 
desirable  as  the  destination  than  a  remote  agent  -with  a  reported  load  level  of  0,  but 
which  was  reported  20  seconds  ago.  There  is  also  the  factor  of  distance,  an  agent 
which  is  far  away  is  generally  less  desirable  than  a  close  one.  Thus,  we  need  a  way  of 
measuring  the  utility  of  an  agent,  given  the  uncertainty  of  the  information  about  its 
number  of  ready  jobs  and  its  distance. 


Sect.  5.5 


Measures  for  Comparing  the  Desirabilities  of  Agents 


91 


5.5.1.  Computing  an  Agent’s  State  Utility 

Let  us  first  consider  the  utility  of  the  states  of  an  agent.  The  objective  is  to 
minimize  job  delay;  thus,  we  will  define  utility  to  be  a  measure  of  negative  expected 
job  delay,  given  the  agent’s  state  which  represents  its  number  of  ready  jobs.  (Utility 
is  a  quantity  to  be  maximized,  and  job  delay  is  to  be  minimized,  hence  the  reason  for 
equating  utility  with  negative  job  delay.) 

Let  elap{w,0 )  be  the  total  elapsed  time  of  job  w,  given  /?,  which  is  the  time- 
averaged  number  of  ready  jobs  during  the  lifetime  of  job  w. 

elap(w,0)  -  total  elapsed  time. 

In  general,  elap{u\  0)  increases  with  increasing  /?,  for  a  given  job  w.  Since  we 
assume  a  homogeneous  set  of  agents,  the  elap  function  is  the  same  for  all  of  them. 

A  job’s  execution  time,  the  total  time  spent  by  an  unloaded  machine  executing  it, 
is 

elap(w,  0)  =  total  execution  time , 

since  there  are  no  other  jobs  to  interrupt  job  w.  Thus,  the  factor  by  which  job  w  s 
elapsed  time  increases  when  there  are  a  total  of  (3  jobs  (including  w)  contending  for 
the  machine  is 

elap ( m,  3)  (5-11) 

elap(w,  0) 

which  we  call  the  stretch  factor.  We  define  the  quantity  p{0)  as  the  negative  stretch 
factor.  The  value  of  p{0)  reaches  its  maximum  (which  equals  0)  when  0  =  0,  and 
decreases  with  increasing  0.  We  now  determine  its  shape. 

To  construct  p(0),  we  must  have  a  model  of  how  jobs  get  control  of  the  CPU.  In 
particular,  this  model  must  tell  us  how  much  time  is  spent  executing  all  jobs,  and  how 
operating  system  overhead  is  accounted  for.  Suppose  that  there  are  0  compute-bound 
jobs,  and  that  each  takes  the  same  amount  of  time  elap{w, 0)  to  complete  if  any  one  of 
them  were  placed  in  an  empty  system.  But  since  there  are  0  of  them,  we  would 
expect  them  to  take  at  least  0  •  elap{w, 0)  to  complete,  given  pure  processor-sharing 
scheduling.  Thus,  we  have 

elap(w,0 )  ^  0  ■  elap(w, 0). 

This  would  be  an  equality  if  there  were  no  overhead,  which  is  never  the  case  in  real 
systems. 

Let  us  use  a  very  simple  model  for  overhead.  Let 

f overhead  =  "T -  <  1  (°-12) 

^max 

be  the  fraction  of  time  attributed  to  overhead  functions  by  the  CPU.  It  is  a  linear 
function  of  the  number  of  ready  jobs:  the  more  jobs  it  has  to  schedule  for  execution, 
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the  more  overhead.  /?max  is  a  constant  representing  the  number  of  jobs  which  causes 
the  CPU  to  spend  all  its  time  scheduling,  therefore  never  getting  any  work  done. 
(Note  that  /?max  must  be  greater  than  Pmax,  the  maximum  number  of  ready  jobs,  oth¬ 
erwise  an  agent  might  get  into  an  undesirable  state  where  it  would  never  get  any  work 

done.) 

Given  a  number  0  of  ready  jobs,  the  fraction  of  time  the  CPU  can  spend  execut¬ 
ing  jobs  is 

f  .  =  1 - — .  (5.13) 

Jfiecutmg  A  n  \  * 

Kmax 


Using  these  models,  we  can  now  say  that  a  job  w,  in  a  system  where  the  number  of 
ready  jobs  is  /?,  would  take  the  following  amount  of  time  to  complete. 


elap(w,  0) 


0  •  elap(w, 0) 
1  -  0/0m*x 


(5.14) 


The  function  p{0)  then  becomes, 

M  = 


-0 

1  -  0/0m*x 


(5.15) 


Given  p{0),  we  define  the  local  utility  of  agent  A,-  as, 

Ui{6)  =  p{0),  M(0)  =  0 ,  for  all  t,  (5.16) 

where  A,  is  in  state  0,  and  M{0),  the  measure  of  state  0,  is  the  numerical  value  0. 
Since  our  agents  are  homogeneous,  they  all  share  the  same  local  utility  function,  given 

by  p{0). 

Note  that  utility  is  not  a  linear  function  of  the  agent’s  state.  This  has  a  pro¬ 
found  effect  on  decisionmaking  when  remote  agents’  states  are  not  known  with  com¬ 
plete  certainty.  We  explore  this  in  more  detail  in  Chapter  6. 


5.5.2.  Conditional  Utility  with  Respect  to  Job  Offloading 

An  agent’s  state  utility  indicates  its  ability  to  execute  jobs,  measured  in  terms  of 
(the  negative  of)  the  expected  increase  in  the  job  s  elapsed  time  relative  to  its  execu¬ 
tion  time.  This  measure,  as  defined  by  (5.15),  depends  solely  on  the  jobs  already 
present  at  that  agent,  and  contending  for  the  CPU. 

Consider  now  the  problem  of  agent  A„  who  is  trying  to  determine  which  agent,  if 
any,  is  the  best  destination  for  a  job  w.  Let  us  assume  for  a  moment  that  A,  knows, 
with  complete  certainty,  the  state  of  every  other  agent,  so  that  it  can  compute  uy(yy), 
for  all  j.  Further,  assume  that  A,  is  the  only  agent  making  an  offloading  decision,  so 
that  a  resonance  is  impossible.  Would  the  best  destination  agent  for  w  be  the  one 
with  the  maximal  utility? 

The  problem  here  is  that  A,  must  compute  the  utility  of  a  future  state  of  Ay, 
that  which  will  occur  if  job  w  is  offloaded  there.  This  computation  depends  on 


Sect.  5.5 


Measures  for  Comparing  the  Desirabilities  of  Agents 


93 


knowledge  which  only  A,  has,  namely  that  w  is  being  considered  for  offloading  to  Ay. 
Therefore,  Ay’s  future  state  utility  is  conditioned  on  knowledge  possessed  by  the  agent 
who  is  computing  the  utility.  The  difference  between  this  conditional  utility  and  the 
previous  absolute  utility  is  that  conditional  utility  depends  on  the  viewpoint  of  the 
computing  agent,  whereas  absolute  utility  does  not. 

There  is  also  the  less  subtle  problem  of  accounting  for  network  delays.  If  the 
delay  in  time  due  to  the  network  transmission  of  a  job  from  one  agent  to  another  is  a 
significant  factor,  as  is  most  likely  if  the  agents  are  distributed  over  a  large  geographic 
area,  conditional  utility  will  be  affected.  This  is  because  utility  was  defined  as  the 
negative  stretch  factor.  Previously,  this  quantity  was  solely  due  to  CPU  contention; 
network  delay,  is  now'  a  factor  to  be  taken  into  account.  The  network  delay  will  gen¬ 
erally  be  a  function  of  the  job  size  (including  the  size  of  data  files  associated  with  the 
job),  and  of  the  distance  between  the  agents. 

Thus,  agent  A,  is  no  longer  just  interested  in  Ay’s  current  state.  It  is  interested  in 
its  future  state,  conditioned  on  the  probability  of  offloading  a  job  to  it,  and  condi¬ 
tioned  on  the  time  it  takes  to  transmit  the  job  to  it.  We  now  formalize  this  notion  of 


conditional  utility. 

Assume  that  the  function  netdelayX]{w ),  i.e.,  the  time  job  w  is  delayed  due  to  net¬ 
work  transmission  from  A,  to  Ay,  is  known.  The  total  elapsed  time  from  the  arrival  of 
w  at  A,-  to  the  end  of  w' s  execution  on  A y  (which  has  0  ready  jobs  just  before  w  s 
arrival)  is, 

netdelayXJ(w )  +  elap(w,  (3+1). 


The  conditional  state  utility  of  agent  Ay  (whose  state  is  6)  from  A,’s  viewpoint,  given 
that  A,-  will  offload  job  w  to  A;,  is 


w) 


_  ^tdday,^)  +  ApW  '  M(e]  =  0'0<  (5.17) 

elap{w ,  0) 


The  ratio  of  a  job’s  network  delay  to  its  execution  time  is  denoted  by 

netdelayiAw ) 

-  -  .mho)- 

Extending  definition  (5.17)  to  include  0  =  5max,  and  expressing  uy,-(0,ti>) 
definitions  (5.15)  and  (5.18),  the  conditional  state  utility  is  defined  by 

/  7]tJ(w)  +  //(/?+ 1),  0  <  Bmalt  M{6)  =  0, 

M^’)  =  -oo  0=  Smax. 


(5.18) 
in  terms  of 


(5.19) 


Since  after  the  transfer  of  w  there  will  be  0+1  jobs  present  at  Ay,  we  use  p,(0+l), 
assuming  Ay’s  ready  job  number,  0,  is  less  than  Smax.  If  0  —  Bmax,  a  job  offloaded  to 
Aj  could  not  be  accepted,  and  would  be  returned.  We  consider  this  to  have  the 
lowest  possible  utility,  — oo.  Note  also  that  r)y(w)  is  independent  of  Ay  s  state. 
Therefore,  conditional  utility  differs  from  absolute  utility  by  a  constant,  but  this 
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constant  depends  on  information  known  only  to  the  computing  agent. 

Unfortunately,  the  conditional  utility  function  forces  agents  to  know,  or  at  least 
reasonably  predict,  the  execution  time  a  job  will  need  (to  compute  r)jj(w)).  The  net- 
delay  function  must  also  be  known,  but  this  is  less  difficult  to  approximate  as  it  is 
often  a  simple  linear  function  of  the  distance  between  agents  and  of  the  total  job  size. 

5.5.3.  Expected  Utility 

So  far  in  this  chapter,  we  have  assumed  that  agents  know  each  other  s  states,  and 
therefore  can  compute  each  other’s  utilities.  Of  course,  this  is  an  unreasonable 
assumption,  as  has  been  emphasized  in  the  previous  chapters.  It  is  true  that,  in  gen¬ 
eral,  agent  A,  does  not  know  with  complete  certainty  that,  say,  the  state  of  Ay  is  6. 
But  A,  can  compute  the  probability  of  this,  based  on  past  information  about  Ay, 

p(so(‘)=«  I  **(<))•  (5-2°) 

Using  the  usual  convention  that  (  =  nT,  a,,  =  and  M{8)  =  0,  this  probabil- 

ity  can  also  be  expressed  as 

p{Bj{n)=/3  |  Ly(n-ay,),  Uy(n-a;V)).  (5.21) 

Consequently,  A,  can  compute  the  expected  uttlity  of  the  state  of  Ay,  based  on  past 
information  A’y, (t).  as  follows: 

E{u,(yj(t))  I  lf,v(()l  =  S  Uj (8)-p(v, (<)=<’  I  **(<))■  (5.22) 

eeYy 

Since  the  probabilities  in  (5.20)  and  (5.21)  axe  equivalent,  (5.22)  can  be  expressed  as 

■®max 

E(uy(yy(0)  |  Kj{{t)\=  2  ^)-p(5;(n)=/?  I  Ly(n-a;V),Uy(n-ay,)).  (5.23) 

/?=0 

If  At’s  information  about  Ay,  ify,(f),  is  that  Lj(n— ay,)  =  A,  and  Vy(n  — ay,)  -  v,  then 
in  terms  of  the  state-transition  matrix  Pv  defined  by  (5.9),  the  expected  utility  is 

£[uy(£y(n))  I  A,u,ay,-  ]  =  £  p{py\Pv') X0-  (5.24) 

P= 0 

Using  the  expected  utility,  an  agent  can  make  comparisons  between  remote  agents 
using  information  which  is,  to  varying  degrees,  uncertain. 

It  is  interesting  to  see  the  effect  of  expected  utility  as  a  function  of  the  age  of  the 
state  information.  We  will  explore  this  behavior  in  Chapter  6. 

5.5.4.  Conditional  Expected  Utility 

We  now  extend  our  formulation  of  the  expected  utility  to  include  conditional 
expected  utility.  This  requires  computing  the  weighted  sum  of  the  conditional  utili¬ 
ties  for  each  possible  state,  weighted  by  the  probability  that  an  agent  is  in  that  state. 
In  formula  (5.19),  which  defines  conditional  utility,  if  the  probability  that  the  agent  is 
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in  state  Bmax  is  non-zero,  the  entire  conditional  expected  utility  will  be  oo.  Conse¬ 
quently,  we  will  first  define  the  conditional  expected  utility  given  that  the  agent  is  not 

m  state  B max  • 

Let  B  represent  the  case  that  Ay  is  in  state  5max, 

B  =  { Bj(n )  =  B  max  } , 

and  let  B  represent  the  negative  of  B,  i.e.,  the  case  that  Ay  is  not  in  state  Bmax.  Let 
pB  be  the  probability  of  B  occurring,  given  K}t(t ); 

PB  =  P(  B  |  Kjt(t)). 

Define  the  conditional  expected  state  utility  of  Ay  from  Ay’s  viewpoint,  given  that  A, 
will  offload  job  w  to  Ay,  and  that  Aj  is  not  in  state  Bmax,  as  follows. 

£[uyt(y7(f),u-)  |  K.,(t),B}  =  — —  £  u^e,w)-p{V]{t)=e  \  Kj{(t)),  (5.25) 

l~P B  6eYrBmtx 

which  equals 

— - —  £  {vlj{w)+p{0+l))-p{BJ{n)=0  |  A'y,-(0).  (5-26) 

1~PB  /3=o 

Since  the  expectation  sums  over  values  0  through  £max-l  for  /?,  it  must  be  divided  by 
the  probability  that  f3  =£  5max,  which  is  1  —  Psmax- 

If  Ay’s  information  about  A;,  Ay,(t),  is  that  Lj(n— aj{)  =  A,  and  Vj(n-a^)  -  v, 
then,  in  terms  of  the  state-transition  matrix  Pv,  the  conditional  expected  utility  is 

E[uj,(Bj{n),w)  |  A,u,Gy,-,B  ]  = 

1  5mix_  1  Q. 

- 1 -  £  {r,tj{w)  +  pL{0+l))-[Pv3  }\P- 

l-[-P“;,]A5max  P=0 

Since  t does  not  depend  on  (3 ,  it  can  be  taken  out  of  the  summation,  and  we  get 
£[u;V(£y(n),u>)  |  A,u,ay,,B  ]  = 

- - -  ij, j(w)  +  £  p{0+l)-\Pv3'} • 

l-[-Pv;']ASmax  P=° 

What  if  Bj(n)  does  equal  £max  (i.e.,  B  is  true)?  How  is  this  incorporated  into 
the  conditional  expected  utility  function?  This  will  depend  on  the  job  offloading  poli¬ 
cies,  which  can  vary  for  different  load  balancing  environments.  For  example,  one  pos¬ 
sible  policy  is  that,  if  A,  offloads  a  job  to  Ay  who  already  has  the  maximum  number 
of  jobs  it  can  support,  Ay  will  indicate  to  A,  that  it  cannot  accept  the  job.  A,  can 
then  consider  another  agent  for  offloading  it,  or  can  retain  it  for  local  execution.  To 
simplify  matters,  if  we  assume  that  A,  will  keep  the  job,  the  conditional  expected  util¬ 
ity  given  B j  =  £max  is, 
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E\uji(Bj(n),w)  |  K}l{t),  B  ]  =  f?,yM  +  n{Bmax).  (5.29) 

By  offloading  a  job  to  Ay  when  Ay  is  in  state  Bmax,  the  state  utility  of  Ay  is  only 
affected  in  that  time  was  wasted  shipping  w.  Since  Ay  does  not  accept  w,  its  actual 
state  is  unaffected.  (This  policy  can  be  improved  by  making  A,  first  send  a  request  to 
A  and  then  making  A,  wait  for  a  confirmation  for  acceptance  from  Ay,  rather  than 
shipping  the  entire  job  first.  In  fact,  Ay  would  reply  with  an  update  concerning  L. 
and  V-,  so  that  A,  could  make  a  decision  based  on  better  information.  To  keep  our 
discussion  simple,  we  will  not  use  these  improved  policies.  Once  all  the  models  are 
defined  using  the  simple  policy,  it  is  not  difficult  to  extend  them  to  incorporate  the 

improvements.) 

Note  that,  although  Ay’s  state  which  is  Bmax  is  unaffected  if  w  is  mistakenly 
offloaded  to  it,  the  state  of  some  other  agent,  the  one  who  eventually  gets  the  job  for 
execution,  most  certainly  will  be  affected  with  a  decrease  in  utility. 

We  can  now  give  a  complete  expression  for  the  conditional  expected  utility. 

j E[uy,(£y(n),u>)  |  A ,u,a;,  ]  = 

Q  _ 2 

£  (7?,y(u>)  +  n{f3+l)y\PaJ'  ] A/?  +  {Vtjiw)  +  /'(^max))'[^t)  Uflmax  * 

0=0 

After  some  simplification,  we  get 

B[uy,(.BJ(n),«;)  |  A ,v,ajt  }  = 

r  a,v ,  . 

r)jj(w)  +  /u(/?+l)'[Pv;  j  A/3  A  max )  l^v  (5.31) 

0-0 

Finally,  we  define  the  conditional  expected  utility  of  A,-,  given  that  it  offloads  job 
w  to  Ay,  which  may  or  may  not  be  able  to  accept  it.  Let 

£’[utV(yt(f),tc)  |  Kji(t)}  =  (l-pB)'ut(A)  +  PB'u>(/?.  +  1)’  ydO  =  A-  (°-32) 

A,’s  conditional  expected  state  utility  is  based  on  its  local  state  not  changing  if  Ay  is 
not  in  state  Bmax,  but  that  it  will  change  (i.e.,  its  number  of  ready  jobs  will  increase 
by  one)  if  Ay  is  in  state  Bmax. 

In  terms  of  Pv,  Ly(n-a;V)  =  A  and  V^n-a,,)  =  v,  this  expectation  can  also  be 
expressed  as 

£[utI(B,(n),u/)  |  A ,u,ayt-  ]  = 

(l-[P^]ASmax)-M(S,(n))  +  [P“y’]A5m„^(5«(n)  +  l)- 

In  summary,  agents  can  use  conditional  expected  utility  as  a  measure  for  compar¬ 
ing  the  desirabilities  of  remote  agents  with  respect  to  offloading  decisions.  This  meas¬ 
ure  accounts  for  the  state  of  a  remote  agent,  including  the  fact  that  such  information 
is  uncertain,  and  for  the  network  delays  between  agents. 
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5.6.  Making  Rational  Decisions 

How  do  agents  make  rational  decisions?  It  will  depend  on  the  types  of  decisions 
agents  make,  i.e.,  on  whether  they  are  problem  decisions,  which  address  the  transfer  of 
jobs  to  balance  the  load,  or  communication  decisions,  which  address  the  maintenance 
of  an  agent’s  view  of  the  global  state.  We  are  now  ready  to  answer  this  question  for 
each  case.  We  also  consider  a  method  for  avoiding  resonances,  designed  specifically 
for  the  load-balancing  problem. 

5.6.1.  Problem  Decision  Rules 

First,  we  will  consider  the  construction  of  an  agent’s  problem  decision  rule. 
Recall  from  Section  3.2  that  an  agent's  decision  d,  is  given  by  the  decision  rule  7„ 

dt(t)  =  7 ,■(*,•(<)>  «.-(*))• 

The  decision  rule  for  agent  A;  is  a  function  of  all  influences  Z{  affecting  it,  and  the 
newly  generated  work  s,  arriving  at  At.  The  influences  are  of  two  types,  transferred 
work  and  information.  In  terms  of  load-balancing,  -4,-  s  decision  rule  is  a  function  of 
the  current  job  arrivals,  including  those  transferred  from  other  agents  as  well  as  new 
ones,  and  of  A,-’s  information  about  other  agents,  Kflt).  The  result  of  the  decision 
rule  is  a  load- balancing  decision:  whether  a  given  job  should  be  offloaded,  and,  if  so, 
where. 

It  is  the  arrival  of  a  job  that  triggers  an  agent  into  making  a  load-balancing  deci¬ 
sion,  invoking  the  decision  rule.  We  will  assume  that  only  new  jobs  (not  those  which 
have  already  been  transferred)  can  be  offloaded.  More  specifically, 

(1)  a  job  cannot  be  transferred  more  than  once; 

(2)  if  a  job  is  to  be  transferred,  the  transfer  must  occur  before  the  job  has  executed; 

(3)  if  a  job  is  transferred  to  an  agent  which  cannot  accept  the  job,  the  job  is  exe¬ 
cuted  locally. 

Note  that  these  limitations  are  by  choice,  and  not  imposed  by  our  formal  model. 
Having  them  accentuates  the  consequences  of  each  load-balancing  decision  that  an 
agent  makes,  since  later  "corrections"  are  not  allowed.  For  example,  if  the  decision  is 
to  offload  a  job  to  a  particular  remote  agent,  and  that  agent  determines  this  was  not  a 
good  decision,  it  cannot  correct  it  by  further  offloading  the  job  to  another  agent. 
Similarly,  if  the  decision  is  to  execute  a  job  locally,  and  later  it  is  found  that  it  would 
have  been  better  to  have  offloaded  it,  we  cannot  correct  it  by  later  offloading. 

An  agent  A,  has  a  decision  space  D,.  Since  our  agents  are  homogeneous,  they 
will  have  the  same  decision  spaces.  Thus,  the  decision  space  for  A,  is 

D,  =  {d0,<5i ,  '  '  ' 

where  decision  8^  means  "offload  to  agent  A^,"  for  any  1  ^k^N,k^i,  if  k—  i,  6 ^  is  the 
decision  to  execute  the  job  locally.  Decision  do  is  the  null  decision,  meaning  that  no 
decision  is  made,  and  the  job  is  kept  locally  but  not  executed  (but  may  be  executed  at 
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a  later  time  when  another  decision  is  made).  Note  the  difference  between  the  A,  s 
decision  variable  d„  which  can  take  on  any  value  in  D„  and  an  actual  decision  value 
d0  or  Sk  for  l^k^N,  which  is  a  particular  element  of  D,. 

In  order  to  make  a  rational  decision,  agents  must  be  able  to  predict  what  will 
happen  to  global  utility  if  a  particular  decision  is  made.  The  decision  which  produces 
a  maximal  positive  change  of  global  utility  is  the  best  decision.  In  Chapter  4,  we  dis¬ 
cussed  the  problems  of  computing  global  utility;  approximations  were  proposed,  of 
which  we  now  make  use.  The  central  idea  was  to  convert  a  global  optimization  into 
local  optimizations,  reducing  the  chance  of  resonances  (caused  by  the  conversion)  by  a 
separate  mechanism.  We  now  present  the  design  of  an  agent’s  decision  rule.  The 
form  of  the  decision  rule  is  the  same  for  every  agent,  again  due  to  the  homogeneity  of 
agents. 

Agent  Ay’s  local  optimization  considers  the  change  in  utility  for  each  possible 
decision  A,  can  make,  ignoring  all  decisions  being  made  concurrently  by  other  agents. 
The  payoff  for  dt  =  6j,  the  decision  to  offload  job  w  to  A;  made  by  A,,  is  defined  as 

A(6y,U>)  =  £[U;i(yj(*)’U?)  I  +  £[“«'( I  Kjii'1)  1 

+  £  £My*(*))  |  KM}. 

k*i,j 

It  is  the  sum.  of  the  utilities  of  the  states  in  which  every  agent  is  expected  to  be  after 
the  decision  is  made.  For  every  agent  except  A;  and  A,,  the  assumption  is  that  the 
state  will  be  the  same  as  what  it  currently  is;  therefore,  we  compute  the  expected  util¬ 
ity  E[uk(yk(t ))  |  Kkt(t)]  of  the  current  state  based  on  the  past  information.  (See  Sec¬ 
tion  5.5  for  definitions  of  the  expectations  used  in  the  formula  for  A(Sj,w).) 

Aj  is  expected  to  change  its  state  if  it  is  not  already  in  state  Bmax,  its  number  of 
ready  jobs  is  expected  to  increase  by  one  since  it  is  to  receive  a  new  job  for  execution. 
Therefore,  we  use  £[u;,(y;(f),™)  |  ifyy(<)],  the  conditional  expected  utility  of  the  state 
Ay  will  be  in  after  A,  offloads  job  w  to  it,  defined  by  formula  (5.31). 

Ay,  the  decisionmaking  agent  computing  the  payoffs,  must  compute  its  own  state 
utility  taking  into  account  the  fact  that,  if  the  job  being  offloaded  is  not  accepted  by 
Ay,  it  must  be  executed  locally.  Therefore,  we  use  £,[u,-,(yI(t), w)  |  A'y,(f)],  defined  by 
formula  (5.32). 

When  a  new  job  w  arrives  at  agent  A,,  A,  computes  the  payoffs  A(Sj,w)  for  all 
l^jXN.  It  then  creates  a  set  of  decisions 

D{(w)  =  { 6k :  A (6k,w)  ^  A(6l,w),  <5*€Dt}. 

Dg(w)  is  the  set  of  all  decisions  which  A,  can  make,  that  have  a  payoff  greater  than  or 
equal  to  that  of  the  decision  to  execute  job  w  locally.  It  will  then  make  a  randomized 
selection,  to  minimize  the  probability  of  a  resonance,  from  Dt{w).  That  will  be  A,  s 
decision  concerning  job  w. 
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5.6.2.  SPACE/TIME  Randomization  for  Load  Balancing 

To  avoid  the  situation  where  agents  that  have  a  job  to  transfer  all  select  the 
minimally  loaded  agent  and  cause  a  resonance,  it  makes  sense  to  randomize  selections 
over  the  space  of  all  good  candidates  as  job  destinations.  In  the  situation  where  there 
are  too  few  good  candidates  for  the  number  of  jobs  to  be  transferred  by  all  agents, 
some  agents  will  delay  their  transfer  for  a  random  amount  of  time.  We  are  now  ready 
to  describe  how  this  is  done  specifically  for  the  load-balancing  problem. 

An  agent  A ,,  about  to  make  a  load-balancing  decision,  must  first  determine  if  it 
should  delay  its  decision.  This  will  depend  on  the  likelihood  of  a  resonance,  based  on 
A,’s  current  information  about  the  load  levels  of  other  agents, 

[Kt{t)]L  =  {Li{n-au),  L2{n-a2i),  •  •  •  ,  LN{n-aNi)). 

Agent  A,  will  determine  the  number  of  jobs  it  expects  all  agents  are  capable  of 
accepting,  and  the  total  number  of  jobs  it  expects  will  be  transferred  by  all  agents. 
We  will  make  use  of  the  thresholds  defined  earlier,  Ru  and  R0.  Recall  that,  if  an 
agent’s  average  number  of  ready  jobs  is  below  Rut  the  agent  is  considered  under¬ 
loaded,  and  if  an  agent’s  average  number  of  ready  jobs  is  above  R0,  the  agent  is  con¬ 
sidered  overloaded. 

Let  C*,  the  job  capacity  of  Ak,  be  the  expected  number  of  jobs  Ak  is  capable  of 
accepting  from  other  agents: 

Ck  =  max(0,  Ru  —  Lk(n—ak{)),  1  <^k^N. 

If  Aks  load  level  is  below  Ru,  then  Ak  has,  on  the  average,  room  for  Ru  -  Lk{n-akl) 
jobs. 

Let  C  be  the  total  job  capacity  of  all  agents: 

N 

c  =  s  ck. 

k  =  1 

Let  Jk,  the  job  overflow  of  Ak,  be  the  expected  number  of  jobs  Ak  has  to  offload, 

Jjt  =  max(0,  Lk(n-akt)  -  R„). 

If  Ak  s  load  level  is  above  R0,  then  Ak  has,  on  the  average,  Ru  Lk{n  aki)  jobs  to 
possibly  offload. 

Let  J  be  the  total  job  overflow  of  all  agents, 

N 

J  =  E  Jk- 

k  =  1 

Given  C  and  J,  should  an  overloaded  agent  offload  now,  or  wait?  Wre  want  the 
agent  to  make  this  determination  quickly,  without  having  to  consult  other  agents. 
Therefore,  the  agent  must  consider  the  likelihood  that  many  jobs  might  be  sent  to  a 
small  number  of  agents.  To  determine  this,  we  consider  the  following  graph  matching 
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problem. 

Given  a  graph  with  two  sets  of  vertices, 

J  =  {j i,  h,  '  '  '  >  3J }, 

and 

C  =  {cx,  c2,  •  •  •  ,  cc}, 

where  the  size  of  J  is  J,  the  total  job  overflow,  and  the  size  of  C  is  C,  the  total  job 
capacity,  let  a  matching  m  be  a  set  of  edges  where,  for  each  edge,  one  vertex  is  from 
J.  and  the  other  vertex  is  from  C,  and  each  vertex  in  J  is  incident  to  at  most  one 
edge.  A  matching  need  not  include  all  vertices  in  J,  or  in  C.  Let  M(J,  C)  be  the  set 
of  all  possible  matchings  given  J  and  C.  Figure  5.1  contains  an  example  of  a  match¬ 
ing. 


Figure  5.1.  Example  of  a  matching. 


A  matching  has  the  at- most- one  property  if  the  degree  of  any  vertex  in  C  is  at 
most  one.  Let  M+(J,  C)  be  the  set  of  all  possible  matchings  that  have  the  at-most- 
one  property.  Figure  5.2  contains  an  example  of  a  matching  with  the  at-most-one 
property. 


Sect.  5.6 


Making  Rational  Decisions 


101 


Figure  5.2.  A  matching  with  the  at-most-one  property. 


Relating  this  to  the  load-balancing  resonance  problem,  J  represents  all  the  jobs 
that  can  be  offloaded,  and  C  represents  all  the  slots  available  at  all  agents  for  those 
jobs.  A  matching  is  simply  a  mapping  of  jobs  to  slots.  A  matching  with  the  at- 
most-one  property  is  a  mapping  where  no  slot  gets  more  than  one  job,  a  desirable 
situation  in  avoiding  resonances. 

A  vertex  j'E J  is  assigned  in  matching  m  if  m  contains  an  edge  (j, c),  cEC.  Con¬ 
sider  the  following  algorithm  for  building  a  matching  mfM.  For  each  j£ J,  randomly 
decide  whether  it  should  be  assigned  in  m  or  not,  with  probability 

P[j  is  assigned)  =  p. 


The  value  of  p  will  be  defined  shortly. 

If  j  is  to  be  assigned,  then  randomly  select  c£ C  such  that  each  vertex  in  C  has 
an  equal  likelihood  of  being  chosen,  and  let  (/,  c)  be  an  edge  in  m. 


If  matchings  are  created  in  this  fashion,  then  the  probability  that  exactly  n  ver¬ 
tices  of  J  are  assigned  in  the  matching  (or,  equivalently,  that  the  matching  will  have 
exactly  n  edges)  is 


If  a  matching  has  exactly  n  assignments,  what  is  the  probability  that  the  match¬ 
ing  has  the  at-most-one  property?  The  total  number  of  possible  matchings  given  n 
assignments  is  Cn.  The  total  number  of  matchings  which  have  the  at-most-one 
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property,  given  n  assignments,  is  simply  the  number  of  permutations  made  up  of  n 
vertices,  chosen  from  the  C  vertices  in  C. 


P(C,n) 


C! 

(C~  n)!’ 


O^nsCC. 


Therefore,  the  probability  that  a  matching  with  n  assignments  has  the  at- most- one 
property  is  simply 


Plan) 

Cn 


c\ 

(C  -  n)\Cn  ’ 


C. 


If  we  now  consider  to  be  a  random  variable,  representing  a  matching  from 
Af+(J,  C)  created  using  the  algorithm  described  above,  we  can  determine  the 
expected  number  of  assignments  E(  \  |  ), 


JS?(|m+|) 


min (J,  C) 

E 

n  =0 


»  (n  I'”!1  - 


P{C,n) 

C" 


(5.36) 


Now,  the  value  of  p  which  maximizes  E{  \  m +  |  )  can  be  found  by  differentiating 
the  expectation  in  (5.36)  with  respect  to  p,  setting  the  derivative  equal  to  0,  and  solv¬ 
ing  for  p: 


dE(  1  m;|) 
dp 


min(  J,  C) 

=  E 

n  =1 


n  J  (1  _  py-»-'  (n  -  Jp)  =  0.  (5.3i 


As  an  illustration,  consider  the  case  where  there  axe  J  >  1  jobs,  but  only  one  slot 
C  =  1.  Substituting  for  C  in  the  derivative,  we  get, 

n  =  1  '  ’  1 

which,  after  simplification,  yields 

J(  1  -  p)J~2(  1  -  Jp)  =  0. 


Solving  for  p,  we  get 


P 


1_ 

J' 


Intuitively,  it  makes  sense  that  an  agent  should  decide  to  offload  a  job  with  pro¬ 
bability  1/J  if  it  knows  that  there  are  J- 1  other  potential  senders,  and  only  one  slot 
available.  In  fact,  this  is  precisely  what  happens  in  a  distributed  system  of  computers 
connected  by  an  Ethernet  [Metc76].  If  J  computers  wish  to  transmit  a  packet  at  the 
same  time,  each  should  randomly  decide  to  do  so  with  probability  1  /J  to  minimize 
the  probability  of  collisions. 

A  table  of  optimal  p  values,  given  J  and  C,  can  then  be  constructed  and  made 
available  to  each  agent.  Whenever  an  agent  has  a  job  to  offload,  it  first  computes  J 
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and  C  using  its  local  information,  looks  up  p,  and  makes  a  randomized  decision.  If 
the  job  is  not  to  be  offloaded,  then  the  agent  can  either  elect  to  execute  it  locally,  or 
to  wait  until  a  later  time  to  make  the  decision  again.  Since  there  are  always  new  jobs 
arriving,  we  choose  always  to  begin  execution  of  the  job  locally,  and  then  randomize 
the  decision  concerning  the  new  job  that  arrives  next. 

We  can  now  express  the  SPACE/TIME  probability  distribution  for  selecting  a 
decision.  Recall  from  the  previous  section  that  an  agent  determines 

Dfi(tv)  =  { Sk :  A(6k,w )  ^  A(<5,,u>),  <5*€D,}, 

the  set  of  all  decisions  which  A,  can  make  with  a  payoff  greater  than  or  equal  to  that 
of  the  decision  to  execute  job  w  locally.  The  probability  of  choosing  local  execution 
has  already  been  determined  above: 

PK  =  6>)  =  P- 

Using  formula  (4.24),  the  probability  of  choosing  decision  6j ,  ;¥*,  is 

1  P  £  A  {6k,w) 

6k£Ds(w),6k±6i 

In  summary,  when  an  agent  receives  a  new  job,  it  first  determines  randomly 
whether  the  job  should  be  offloaded  or  not.  This  random  decision  is  based  on  its  per¬ 
ception  of  how  many  jobs  will  be  offloaded  by  all  agents,  and  how  much  capacity 
there  is  for  accepting  jobs  by  all  agents.  The  agent  uses  this  to  determine  the  optimal 
offloading  probability,  the  one  that  maximizes  the  expected  number  of  jobs  offloaded 
while  avoiding  resonances.  If  it  is  decided  to  offload  the  job,  then  one  of  the  multiple 
remote  agents  which  are  good  candidates  is  selected  randomly,  with  the  probability  of 
selecting  an  agent  proportional  to  its  relative  payoff  in  utility. 

5.6.3.  Decisionmaking  Processes  for  Load  Balancing 

We  can  now  describe  the  various  decisionmaking  processes  of  a  load-balancing 
agent.  An  agent’s  decision  procedure,  which  gets  evaluated  when  a  new  job  arrives 
and  whose  result  is  a  load-balancing  decision,  has  four  distinct  phases. 

(1)  situation  evaluation; 

(2)  job  evaluation; 

(3)  destination  evaluation; 

(4)  SPACE/TIME  randomization. 

In  the  first  phase,  situation  evaluation,  the  agent  considers  whether  a  load-balancing 
decision  is  actually  necessary,  given  its  beliefs  about  the  global  system  state.  In  par¬ 
ticular,  the  agent  considers  its  own  local  state.  If  the  agent  is  lightly  loaded  (i.e.,  the 
number  of  ready  jobs  is  below  some  threshold  Ru),  there  is  no  reason  to  spend  time 
evaluating  the  rest  of  the  decision  procedure  since  the  decision  to  run  the  job  locally  is 
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a  good  one,  and  can  be  made  quickly.  If  the  agent  is  not  lightly  loaded,  it  considers 
the  states  of  remote  agents.  If  all  remote  agents  are  heavily  loaded  (i.e.,  their 
numbers  of  ready  jobs  are  above  some  threshold  R0),  then  again,  there  is  no  reason  to 
spend  time  evaluating  the  rest  of  the  decision  procedure  since  the  decision  to  offload 
anywhere  is  a  bad  one.  Consequently,  the  decision  to  run  locally  is  best,  and  can  be 

made  quickly. 

Assuming  the  situation  warrants  further  evaluation  of  the  decision  procedure,  the 
process  enters  the  second  phase,  job  evaluation.  Specifically,  it  is  determined  whether 
the  job  is  a  good  candidate  for  offloading.  For  example,  the  longer  the  job  s  expected 
execution  time,  the  more  desirable  the  job  is  for  offloading.  Although  in  general  such 
information  is  not  explicitly  available,  it  may  be  possible  to  infer  it  based  on  an 
analysis  of  past  behavior,  such  as  on  the  previous  execution  times  of  the  same  job,  or 
on  those  of  similar  type  jobs. 

The  third  phase  is  destination  evaluation,  where  remote  agents  are  considered  as 
possible  destinations  for  the  job.  In  this  phase,  hypotheses  are  made  concerning  the 
expected  improvement  in  utility,  and  are  based  on  local  information  about  the  states 
of  remote  agents.  We  have  called  this  expected  improvement  the  payoff  of  a  decision 
to  transfer  a  job  to  a  particular  agent.  Agents  w'ith  positive  payoffs  make  up  the 
space  of  possible  destinations,  used  by  the  next  phase. 

Finally,  the  fourth  phase  is  SPACE/TIME  randomization,  where  the  destination 
for  the  job  is  actually  selected.  This  selection  is  based  on  randomizing  over  the  space 
of  possible  candidates,  and  possibly  randomizing  over  a  future  time  interval  (as 
described  in  Section  5.6.2),  so  that  resonances  are  avoided. 

Figure  5.3  summarizes  the  phases  of  an  agent  s  load-balancing  decision  pro¬ 
cedure. 
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Beliefs  about  Beliefs  about  Hypotheses  Beliefs  about 

local  and  remote  job  execution  about  effects  local  and  remote 

states  times  of  offloading  states 


load- balancing 
decision 


Figure  5.3.  Phases  of  a  Load-Balancing  Decision  Procedure. 


5.6.4.  Communication  Decision  Rule 

Agents  must  decide  when  to  communicate  in  order  to  update  each  other’s  state 
information.  In  Section  4.6,  we  described  the  process  of  determining  when  to  inquire 
about  a  remote  agent’s  state.  This  inquiry  occurred  when  the  sum  of  the  loss  due  to 
communication  overhead  and  the  loss  due  to  degradation  in  decision  quality  (due  to 
aging  information)  was  at  a  minimum. 

For  load  balancing,  agents  will  use  a  two-way  cooperative  communication  proto¬ 
col  (see  Figure  5.4):  an  agent  will  request  a  state  update  from  a  remote  agent  when  its 
information  about  that  remote  agent  becomes  too  uncertain;  an  agent  will  voluntarily 
offer  a  state  update  to  a  remote  agent  when  it  believes  the  quality  of  the  decisions 
made  by  that  remote  agent  will  significantly  improve  -with  updated  information. 


Request  when  uncertainty  large 


Offer  when  significant  change  occurs 

Figure  5.4.  Two-way  cooperative  protocol  for  state  updating. 
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When  should  an  agent  request  an  update  from  a  remote  agent?  To  answer  this, 
we  first  need  a  loss  function  for  communication  overhead.  Assume  that  agent  At 
sends  or  receives  a  message  (that  is,  communicates)  with  average  period  Tp.  It  does 
not  matter  whom  A,  communicates  with,  it  simply  matters  that  a  communication 
takes  place  every  Tjx  time  units.  Assume  also  that  every  communication  incurs  a 
fixed  amount  of  overhead  in  time,  Tc  (see  Figure  5.5).  We  can  then  say  that  the  frac¬ 
tion  of  time  Ax  wastes  due  to  communication  overhead  is  Tc/Tp. 


I— I  = 


Tc 


Figure  5.5.  Communication  overhead  over  time. 


From  (5.12),  the  fraction  of  time  that  an  agent  whose  number  of  ready  jobs 
equals  (3  wastes  due  to  job  scheduling  overhead  is  0/0max-  Combining  the  two 
sources  of  overhead,  we  can  say  that  the  loss  due  to  communication  overhead,  given 
the  presence  of  job  scheduling  overhead,  is  the  agent’s  state  utility  without  communi¬ 
cation  overhead,  minus  the  agent’s  state  utility  including  communication  overhead: 


L[T)(T,i) 


-  0 

1  -  0/0max 


_ -_3 _ 

1  —  P / flmax  ~  Tc  /  Tp 


(5.38) 


Simplifying,  we  get 

l[t){tp)  = 


_ 0-C/Tjj _ 

(1  —  0/0max){l  P/0max  ~  Tc/Tp) 


(5.39) 


Checking  extreme  values,  we  see,  as  expected,  that 
head  fe  =  Tc/Tp  approaches  zero,  the  loss  goes  to  zero: 


lim  L[T\Tp)  =  0, 
h  -  0 


as  the  communication  over- 


and,  as  the  communication  overhead  approaches  1  —  0/ 0maxi  ^oss  §oes  t0  infinity . 
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lim  l[T^ ( T;i)  -  oo. 

fc  1—  0/Pm&x 

We  use  this  result  to  determine  the  maximum  communication  bandwidth  for  updat¬ 
ing  state  information,  given  the  maximum  loss  an  agent  is  willing  to  allow.  Then,  a 
portion  of  this  bandwidth  is  assigned  to  communication  with  every  remote  agent.  We 
have  described  how  this  is  done  in  Section  4.6.5. 

We  also  need  a  loss  function  for  degradation  in  decision  quality  due  to  aging 
information.  We  will  use  the  pair-wise  approximation  based  on  A,’s  local  state,  and 
Ay’s  possible  local  states,  using  formulas  (4.20),  (4.21),  and  (4.22)  which  were 
developed  in  Section  4.6.4. 

Consider  an  agent  A,’s  decision  to  offload  a  job  to  remote  agent  Ay.  Let  /?,  be 
A,’s  number  of  ready  jobs,  and  let  A,’s  information  about  Ay  s  load  level  be  A.  Say 
that  uJ,  a  job  of  average  size,  arrives  at  A,.  Let 

+  rjijffl  +  M(/?j  +  l)>  (5‘40) 

represent  the  utility  of  A,  and  Ay  if  w  is  offloaded  from  A,  to  Ay,  and  let 

RijiPiJj)  =  +  A0jh  (5-41) 


represent  the  utility  of  A,  and  Ay  if  tv  is  retained  by  A,. 

If  A,  offloads  w  to  Ay,  the  sum  of  A.’s  state  utility  and  the  conditional  expected 
state  utility  of  Ay  from  A,’s  viewpoint  will  be 

offlaad  =  £|U„(B,(n),uO  I  A,u,a,v]  +  E[u^B,(n),w)  j  X,v,ajt  ], 
which,  by  definition  (5.40),  equals 


^max  1 


^  max  *  .  .  ^flii  i 

~y,(ay,)  |  offload  =  £  Oxj{0i,f3)[Pj'  }\p  +  Oi;  (Smax ,  /?,)  [Pv  ]a5 

,3=0 


max 


(5.42) 


Notice  that,  if  Ay  has  a  maximum  number  of  ready  jobs,  then  it  cannot  accept  A,  s 
job,  and  A,  must  consider  the  consequences  of  retaining  the  job. 

If  A,  does  not  offload  w  to  Ay,  the  sum  of  A,  s  state  utility  and  the  conditional 
expected  state  utility  of  Ay  from  A,’s  viewpoint  will  be 

~ri  ( a  JI )  i  retain  =  +  E[uj{Bj{n))  \  A,V,Oyt], 


which,  by  definition  (5.41),  equals 

retain  =  Yi  Rij(0iiP){Pv'\xp-  (o.43) 

0=0 

In  fact,  A,  will  base  its  decision  to  offload  or  not  on  which  of  these  two  utilities  is 
greater.  Therefore,  we  can  say  that  A,  will  make  the  decision  which  maximizes  the 
pair-wise  conditional  expected  utility  of  A,  and  A;, 
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Uy,(ay,)  —  maxfuy,  ( aJt )  I  offload  ,  uy«'(a/«)l  retain  j- 


(5.44) 


We  now  need  £o  calculate  the  maximum  possible  sum  of  pair-wise  conditional 

expected  utilities,  TTy,,  based  on  making  best  decision  if  Ay’s  state  were  known  by  A,. 

This  is  given  by  the  formula, 

* 

uji(aji )  — 


B  r 


t-l 


(5.45) 


ma x(  0,y (/?,,/?),  Rtj(Pt,0)  j  \P  J'}\0  +  ]A5max 

B=o  '  ’ 

r 

The  difference  between "uy,( ay,)  and  u ]t ( a y, )  is  that  the  former  is  the  maximum  of  state 
utilities  which  are  consequences  of  an  offload  or  retain  decision.  This  decision  is  based 
on  the  expected  utility  of  Ay’s  state.  The  latter  is  the  expectation  of  the  maximum  of 
state  utilities  which  are  consequences  of  an  offload  or  retain  decisions,  considered  for 
each  possible  state  of  Aj. 

Therefore,  A,’s  loss  function  for  degradation  in  decision  quality  due  to  aging 

information  (based  on  formula  (4.22),  see  Section  4.6.2)  about  Ay  is 

* 

L^d\a]i)  =  uji(aj<)  ~  UJ*  ( aJ* ) ' 

This  loss  is  a  function  of  the  age  of  information;  we  need  to  express  it  as  a  function  of 
period  7y,.  By  (4.6)  in  Section  4.6.1, 

We  need  the  discrete  time  version  of  L^(Tji).  Let 


and  let 


*/.  - 


Ni  = 


_2L 

T 


II 

T 


where  Ty,  and  TVy*  is  the  continuous  and  discrete  time  communication  period  respec¬ 
tively,  where  Tt  and  Nt  is  the  continuous  and  discrete  average  transmission  time 
respectively,  and  T  is  the  continuous  time  state-transition  period.  Then, 


1  N*+Nt  ,  , 

£m»)  -  4~  S  BfH a,)  ■  T. 


A ’t 


(5.46) 


Now  that  we  have  the  communication  loss  function  and  the  decision  quality  loss 
function,  we  can  compute  the  period  Ty,  for  which  their  sum  is  at  a  minimum,  which 
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is  the  best  period  for  communication. 

Finally,  an  agent  will  voluntarily  offer  a  state  update  to  a  remote  agent  when  it 
believes  the  quality  of  the  decisions  made  by  that  remote  agent  w ill  significantly 
improve.  When  should  this  happen? 

We  have  assumed  all  along  that  an  agent  A ,  can  infer  Aj  s  state  based  ^on 
knowledge  of  Aj  s  load  level  Xy,  and  Aj  s  state  transition  probability  matrix,  Rv  , 
which  is  derived  from  the  measure  of  variability  I y.  Therefore,  when  Lj  or  1  j 
change,  Aj  must  broadcast  their  new  values  to  all  interested  agents  (which  we  have 
assumed  to  be  all  agents).  Note  that  Aj  knows  the  values  of  Lj  and  Vj  with  complete 
certainty,  and,  when  they  change,  Aj  know's  that  all  other  agents  have  old  informa¬ 
tion  which  must  be  updated. 

This  method  imposes  an  acceptable  overhead  since  the  load  level  and  the  degree 
of  variability  change  very  slowly.  (If  they  did  not  change  slowly,  other  variables 
would  have  to  be  identified  w'hich  changed  slowly  in  time,  while  providing  enough 
information  to  a  parameterized  model  so  that  statistical  inferences  could  be  made 
between  updates.) 

In  summary,  we  have  described  a  two-way  cooperative  protocol  betw'een  pairs  of 
agents,  A,  and  Aj.  When  A,*s  information  about  Aj  becomes  too  uncertain.  A,  sends 
a  request  to  Aj  for  an  update.  When  Aj  senses  a  change  in  its  load-level  or  degree  of 
variability,  it  volunteers  the  new'  information  to  other  agents.  The  period  between 
update  requests  is  that  w'hich  minimizes  the  sum  of  two  loss  functions:  the  loss  from 
communication  overhead,  and  the  loss  from  degradation  in  decision  quality  due  to 
aging  information. 
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We  now  discuss  load  balancing  experiments  in  which  we  use  our  principles  and 
techniques  for  intelligent  decentralized  control.  The  goal  is  to  demonstrate  the  appli¬ 
cation  of  these  techniques  and  principles,  and  verify  their  feasibility.  The  chapter  is 
organized  as  follows.  We  will  first  summarize  and  analyze  the  load  balancing  problem 
in  more  concrete  terms  than  in  the  previous  chapter.  Next,  we  describe  the  experi¬ 
mental  setup  and  the  approach,  which  includes  a  validation  study  of  the  simulator. 
We  then  provide  experimentally  determined  values  for  the  parameters  of  the  models 
developed  in  Chapter  5.  Finally,  we  present  the  results  of  the  experiments. 

6.1.  Experimental  Load  Balancing 

In  Chapter  3,  we  presented  a  formal  model  that  ignored  the  distinction  between  a 
machine  (or  computer  system),  which  supports  the  execution  of  jobs,  and  an  agent 
residing  on  the  machine,  which  makes  decisions  pertaining  to  a  decentralized  control 
problem.  W7e  now  do  need  to  make  this  distinction,  because  in  our  experiments  the 
machine  is  simulated,  but  the  agent  is  real.  With  this  in  mind,  we  shall  summarize 
the  main  ideas  behind  load  balancing. 

The  load  balancing  problem  centers  around  dynamically  assigning  jobs  to 
machines  so  that  some  job  performance  index,  such  as  the  average  response  time,  is 
optimized.  The  important  characteristics  of  the  problem  are:  each  machine  has  its 
own  job  stream;  an  agent  on  each  machine  decides  whether  a  job  should  either  exe¬ 
cute  locally,  on  the  machine  owning  the  job  stream  it  came  from,  or  remotely,  but  in 
this  case  the  job  must  be  explicitly  sent  to  a  specific  remote  site;  job  arrival  times  and 
service  times  are  not  known  in  advance  to  the  agent  (even  though  the  inter-arrh  al 
and  service-time  distributions  may  be  predicted  using  past  information). 

We  focus  on  the  decentralized  source-initiated  form  of  the  problem;  i.e.,  when  a 
job  arrives  at  a  machine,  the  agent  residing  on  that  machine  must  make  a  load  balanc¬ 
ing  decision  of  whether  to  execute  the  job  locally  or  remotely,  and,  if  remotely,  where. 
This  is  in  contrast  to  the  receiver-initiated  scheme,  where  agents  request  work  from 
other  agents.  There  was  no  reason  to  select  one  scheme  over  the  other.  Either  would 
have  satisfied  our  goal,  which  was  to  determine  the  feasibility  of  our  techniques,  and 
not  necessarily  to  find  the  best  scheme  for  load  balancing.  (See  [Eage86]  for  a  com¬ 
parative  analysis  of  the  two  schemes.) 

How  can  load  balancing  optimize  the  average  response  time  of  jobs?  A  job  s  life¬ 
time  can  be  divided  into  a  number  of  time  intervals,  where  for  each  interval  the  job  is 
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characterized  in  one  of  three  ways: 

(1)  job  dependent  execution; 

(2)  job  dependent  sleeping; 

(3)  system  dependent  waiting. 

Thus,  during  the  job’s  lifetime,  the  job  is  either  executing,  or  sleeping  (i.e.,  doing 
nothing)  for  an  interval  of  time  which  is  dependent  on  the  job’s  characteristics,  or 
waiting  for  a  time  which  is  dependent  on  external  system  factors.  For  example,  if  a 
job  cannot  execute  because  there  are  other  jobs  which  must  also  execute  on  the  same 
machine,  this  is  a  system  dependent  factor  and  therefore  the  job  is  classified  as  wait¬ 
ing.  On  the  other  hand,  if  a  job  cannot  execute  because  it  must  wait  for  input  from  a 
user,  this  is  a  job  dependent  factor,  and  therefore  the  job  is  classified  as  sleeping. 

The  sleep/wait  distinction  is  made  for  the  following  reason:  wait  time  can  be 
affected  by  load  balancing,  sleep  time  cannot.  A  good  load  balancing  scheme  will 
minimize  wait  time,  but  it  cannot  affect  sleep  time.  (We  are  taking  a  very  idealized 
view  of  the  separation  between  wait  and  sleep  times.  In  a  real  system,  sleep  time  and 
wait  time  are  generally  not  independent  of  each  other.  For  example,  the  users  input 
speed,  and  the  input  itself,  in  an  interactive  program  can  be  affected  by  sluggish 
response  time.) 

So  if  a  goal  of  load  balancing  is  to  minimize  job  lifetimes,  and  only  the  wait  time 
component  of  the  lifetime  can  be  affected,  then  it  is  the  wait  times  which  must  be 
minimized.  Factors  contributing  to  wait  times  are  queueing  delays  at  the  CPU,  disks, 
or  network  communication  channels,  and  network  transfer  times  when  offloading.  If 
good  load  balancing  decisions  are  being  made,  jobs  will  be  assigned  to  machines  in 
such  a  way  that  contention  for  these  resources  is  spread  more  uniformly  over  time. 

The  decision  rules  that  agents  will  execute  are  stochastic  replicated  decision  func¬ 
tions  [Stan85].  They  are  stochastic  because  decisions  will  be  probabilistic,  due  to  the 
fact  that  important  aspects  of  the  job  streams  are  not  known  in  advance.  The\  are 
replicated  because  all  agents  use  the  same  algorithm,  control  is  fully  decentralized, 
and  jobs  can  execute  on  any  machine. 

There  are  a  number  of  important  reasons  why  we  chose  load  balancing  as  a  vehi¬ 
cle  for  illustrating  the  usefulness  of  our  methods.  Load  balancing  is  a  relevant 
research  topic  in  itself,  worthy  of  investigation.  The  operating  systems  community 
has  recognized  the  importance  of  location-independent  process  (i.e.,  job)  design,  so 
that  load  balancing  of  jobs  is  feasible  [Powe83].  Also,  load  balancing  has  been  found 
to  be  effective  [Zhou87],  although  the  question  remains  as  to  what  is  the  best  way  to 
do  load  balancing.  Further,  current  distributed  systems  research  is  focusing  on  the 
design  of  very  large  distributed  systems  [Ande87],  where  the  potential  for  resource 
sharing,  and  in  particular,  processor  sharing,  is  great.  Finally,  one  can  create  con¬ 
trolled  meaningful  simulation  experiments  for  load  balancing,  given  the  availability  of 
job  trace  data,  and  one  can  verify  the  realism  of  a  simulated  environment,  something 
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we  have  made  a  great  effort  to  do. 

6.2.  Experimental  Setup 

The  environment  of  our  load  balancing  experiments  is  a  simulated  distributed 
system  of  DEC  VAX/780  machines  running  the  Berkeley  Unix  operating  system. 
These  machines  are  connected  by  a  point-to-point  network  whose  topology  is  ran¬ 
domly  created  for  each  experiment  with  the  constraint  that  each  machine  has,  on  the 
average,  three  neighbors. 


Each  machine  individually  simulates  its  own  job  activity  (i.e.,  the  scheduling  and 
movement  of  jobs  among  a  number  of  servers).  The  network  is  simulated  in  the  sense 
that  inter-machine  transmissions  are  delayed  as  a  function  of  the  distance,  i.e.,  the 
number  of  hops  between  machines.  Routing,  link  traffic,  and  congestion,  are  not 
simulated,  but  queueing  of  messages  at  the  source  and  destination  machines  (not  at 
the  intermediate  nodes)  is. 

6.2.1.  Processor  Simulation  Model 

Each  machine  is  individually  modeled  as  a  set  of  five  servers  with  queues:  a 
source,  a  CPU,  an  I/O  device,  a  network  interface,  and  a  sink.  The  servers,  queues, 
and  their  connections  are  illustrated  in  Figure  6.2. 
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Figure  6.2.  Simulation  model  of  a  single  machine. 


These  servers  and  queues  are  occupied  by  jobs,  which  are  the  basic  units  of  work. 
There  are  two  types  of  jobs,  user  and  system.  User  jobs  generally  represent  work  ini¬ 
tiated  by  human  users,  and  axe  the  objects  which  may  get  offloaded  for  load  balanc¬ 
ing.  System  jobs  represent  work  done  on  behalf  of  the  system  (e.g.,  scheduling). 

The  source  server  produces  user  jobs  at  specific  points  in  time.  Characteristics  of 
these  user  jobs,  such  as  their  arrival  time,  their  total  execution  time,  and  their  total 
elapsed  time,  are  determined  by  reading  a  trace  of  job  accounting  records  derived 
from  the  real  workload  of  a  Berkeley  Unix  system.  Consequently,  the  simulation  is 
trace-driven. 


114 


Experiments 


Chap.  6 


The  CPU  server  simulates  the  time-shared  execution  of  jobs.  Jobs  are  served  or 
are  executed  by  being  delayed  in  the  CPU  server  for  a  short  fixed  period  of  time  q 
called  a  quantum.  They  will  repeatedly  visit  the  CPU  server,  each  time  for  one  quan¬ 
tum,  until  they  have  executed  for  a  time  equal  to  their  total  execution  time  defined  in 
the  trace. 

The  CPU  server  has  four  queues:  arrival,  foreground,  background,  and  system. 
When  a  user  job  arrives  at  a  busy  CPU,  it  enters  the  arrival  queue  if  it  is  a  new  job, 
the  foreground  queue  if  it  is  a  young  job,  and  the  background  queue  if  it  is  an  old  job. 
A  new  job  is  one  which  has  not  accumulated  any  CPU  time  yet.  A  young  job  is  one 
which  has  accumulated  less  than  P//6  seconds  of  time,  and  an  old  job  is  one  which 
has  accumulated  at  least  T//i  seconds  of  time.  Tf/b  is  a  tunable  parameter  of  the 
simulator.  Only  user  jobs  are  classified  as  new,  young  or  old.  The  system  queue  is 
used  only  for  system  jobs,  which  go  there  regardless  of  their  accumulated  CPU  time. 

The  queueing  policy  is  as  follows.  When  the  CPU  server  releases  a  job  (the  job 
has  executed  for  one  quantum)  it  removes  a  job  from  the  system  queue,  if  one  exists, 
and  executes  it.  If  the  system  queue  is  empty,  it  looks  at  the  arrival  queue  for  a  job. 
If  the  arrival  queue  is  empty,  it  looks  at  the  foreground  queue  for  a  job.  If  the  fore¬ 
ground  queue  is  empty,  it  finally  looks  at  the  background  queue  for  a  job.  Thus,  the 
queues  implement  a  single  virtual  priority  queue,  with  system,  new,  young,  and  old 
being  the  job  priority  order.  Jobs  of  the  same  type  are  handled  first-come-first-served. 

The  I/O  server  is  an  infinite  server  with  no  queueing.  It  models  a  job’s  I/O  (e.g., 
terminal  and  disk)  time  as  a  fixed  delay  between  visits  to  the  CPU.  The  total  I/O 
time,  summed  over  all  visits,  is  derived  from  trace  file  data.  (Unfortunately,  job 
arrival  times  at  each  I/O  device  were  not  available  from  the  traces,  and  therefore  they 
had  to  be  estimated.)  Note  that  we  do  not  model  the  queueing  that  might  actually 
occur  in  a  real  system.  This  was  done  for  two  reasons: 

(1)  this  simplifies  the  simulator; 

(2)  the  CPU  is  by  far  the  bottleneck  for  jobs  in  the  systems  we  have  observed,  with 

very  little  queueing  occurring  at  I/O  devices. 

The  network  server  is  used  to  transfer  jobs  from  one  machine  to  another,  to  sup¬ 
port  load  balancing.  It  simulates  the  delays  that  would  be  incurred  in  packet 
transmission  in  a  real  system.  The  delay  times  are  determined  by  the  size  of  what  is 
being  transmitted,  and  by  the  distance  in  hops  between  the  machines.  A  message  is  a 
single  packet,  the  smallest  unit  of  size,  and  jobs  are  multiple  packets  comprising  their 
code  and  data  files  (the  number  of  packets  for  the  transfer  of  a  job  is  inferred  from  the 
trace  data).  The  network  topology  is  randomly  generated,  constraining  each  machine 
to  have  three  neighbors  (i.e.,  links  to  other  machines)  on  the  average. 

Finally,  the  sink  server  is  the  final  destination  of  a  job.  Its  function  is  to  record 
job  statistics,  and  to  release  resources  owned  by  the  job. 
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6.2.2.  Job  Activity 

So  far.  we  have  described  the  operation  of  each  server  in  isolation.  VV  e  now  con¬ 
sider  the  possible  paths  a  job  will  take  about  the  servers,  which  will  illustrate  server 
interactions.  When  a  job  is  created,  a  decision  by  an  agent  must  be  made  whether  to 
execute  it  locally  or  remotely.  Once  this  decision  is  made,  our  simulated  machine 
guides  a  job  through  each  server  until  it  has  completed  execution.  Note  that  every 
machine  has  its  own  single  agent  which  makes  load  balancing  decisions.  For  now,  we 
will  defer  the  discussion  of  agents  and  load  balancing,  and  focus  on  a  job’s  simulated 
activity  within  a  machine. 

When  a  job  arrives  at  a  machine  to  begin  execution,  it  first  enters  the  CPU 
server’s  arrival  queue.  The  arrival  queue  is  only  used  for  new  jobs;  once  the  job  has 
received  some  service  from  the  CPU,  it  will  use  the  foreground  queue  exclusively,  until 
its  execution  time  surpasses  a  threshold  Tf/b,  after  which  it  uses  the  background 
queue  exclusively. 

When  serviced  by  the  CPU,  the  job  executes  for  one  time  quantum  q.  After  this, 
assuming  the  job  needs  more  execution  time,  it  will  cycle  about  the  CPU  (and  its 
queues),  until  it  needs  to  do  I/O. 

Unfortunately,  the  trace  accounting  file  created  by  Berkeley  Unix  did  not  provide 
job  arrival  times  at  I/O  devices;  therefore,  they  had  to  be  estimated.  We  simply  used 
a  fixed  quantity,  Nq,  which  represents  the  maximum  number  of  times  a  job  can  cycle 
about  the  CPU  before  needing  I/O.  (Values  for  q,  Nq,  and  Tf/b  were  determined  bv 
experimentation.  The  optimal  values  which  provided  the  minimal  error  m  validation 
tests,  described  later,  were  q  =  1/64  seconds,  Nq  =  8,  Tf /b  =  .75  seconds.) 

After  a  job  has  visited  the  CPU  Nq  times,  assuming  it  has  not  completed  execu¬ 
tion,  it  goes  to  the  I/O  server.  There,  it  receives  a  variable  amount  of  sendee  time, 
with  the  constraint  that  after  all  its  visits  to  the  I/O  server,  the  total  I/O  time  equals 
the  value  for  total  I/O  time  provided  by  the  trace  file.  After  I/O  service  completes, 
the  job  returns  to  the  CPU  server. 

The  cycling  between  the  CPU  and  I/O  servers  continues  until  the  job  has  com¬ 
pleted  execution.  Upon  completion,  the  job  enters  the  sink  sender,  where  job  statis¬ 
tics,  including  the  mean  and  variance  of  the  job’s  queueing  and  service  times  for  each 
sen'er,  are  recorded.  The  job  is  then  destroyed. 

The  job's  simulated  elapsed  time  is  the  time  interval  which  begins  when  the  job 
is  generated  by  the  source  server,  and  ends  when  it  leaves  the  system  at  the  sink 
server.  Note  that  this  simulated  elapsed  time  is  the  sum  of  the  job  s  execution  and 
I/O  times,  which  are  given  values  from  the  trace  file,  and  the  CPU  queueing  time, 
which  is  a  function  of  the  simulated  system’s  dynamic  behavior.  (It  also  includes  net¬ 
work  queueing  and  transmission  delay  time  if  load  balancing  is  in  effect,  and  the  job 
has  been  transferred  from  one  machine  to  another.)  If  the  simulator  works  well,  the 
simulated  elapsed  time  (with  no  load  balancing)  will  be  close  to  the  real  elapsed  time, 
which  is  also  obtained  from  the  trace  file.  This  was  one  of  the  measures  we  used  for 
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the  simulator’s  validation. 

6.2.3.  Job  Movement  Between  Machines 

When  load  balancing  is  activated,  an  agent  considers  a  new  job  for  machine 
placement  after  leaving  the  source  server.  If  the  job  is  to  execute  locally  (on  the  same 
machine  where  it  was  generated),  it  goes  to  the  local  CPU  server.  If  it  is  to  execute 
on  a  remote  machine,  it  goes  to  the  local  network  server,  and  from  there  goes  to  the 
remote  machine’s  CPU  server.  Once  the  placement  decision  is  made,  the  job  resides 
for  its  entire  lifetime  on  the  selected  machine.  This  is  in  contrast  to  job  migration 
(also  referred  to  as  process  migration) ,  where  a  job  can  be  moved  at  any  time  during 
its  lifetime.  Although  job  migration  is  more  difficult  from  an  operating  system  design 
point  of  view,  it  might  produce  better  load  balancing  results  since  redistribution  of 
load  occurs  on  a  finer  granularity.  As  discussed  in  Section  5.6.1,  the  consequences  of 
bad  job  migration  decisions  are  less  severe  than  those  of  bad  job  placement  decisions 
since  they  can  be  "corrected."  Since  our  goal  is  to  test  the  decisionmaking  capabilities 
of  a  new  decentralized  control  system,  and  not  necessarily  to  determine  what  the  best 
load  balancing  method  is,  we  have  chosen  job  placement  load  balancing  for  our  exper¬ 
iments. 

6.2.4.  Operating  System  Overhead 

In  a  real  system,  not  all  CPU  time  is  devoted  to  running  jobs.  Some  time  is  lost 
to  overhead  incurred  by  the  operating  system  to  carry  out  such  operations  as  context 
switching,  priority  calculation,  queue  manipulations,  and  job  table  lookups,  to  name  a 
few.  In  our  simulator,  this  overhead  is  represented  by  a  system  job  which  runs 
periodically  (each  second)  and  uses  up  some  CPU  time. 

In  Chapter  5,  we  proposed  a  simple  model  for  the  fraction  of  time  spent  due  to 
CPU  overhead,  based  on  the  simple  idea  that  the  more  jobs  there  are  for  the  operat¬ 
ing  system  to  consider,  the  more  time  it  spends  in  overhead.  Assuming  that  the 
number  of  ready  jobs  0  is  known  (the  details  about  how  0  can  be  obtained  are  dis¬ 
cussed  later),  this  fraction  was  approximated  by  the  linear  function  of  0  given  in 
(5.12),  which  is 

0 

1 overhead  n  <1. 

Pmax 

Using  this,  we  determined  in  (5.14)  the  ratio  of  a  compute-bound  job  s  expected 
elapsed  time  to  its  execution  time,  for  a  given  /?: 

elapsed  time _ 0 _ 

execution  time  1  —  0/0max 

In  fact,  this  is  the  measure  used  to  define  the  utility  of  an  agent  in  state  6,  where 

M{8)  =  0. 
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To  obtain  /?max,  and,  more  importantly,  to  check  if  the  model  reflects  reality, 
experiments  on  a  real  system  were  performed.  We  created  a  purely  CPU-bound  job 
which  used  10  seconds  of  CPU  time,  and  ran  it  every  10  minutes.  Each  time  it  ran, 
we  recorded  the  elapsed  time  and  the  number  of  ready  jobs  averaged  over  the  job’s 
elapsed  time.  The  results  are  summarized  in  Figure  6.3,  showing  the  elapsed  time  to 
execution  time  ratio,  as  a  function  of  the  average  number  of  ready  jobs. 


Average  number  of  ready  jobs 

Figure  6.3.  Ten-seconds  execution  of  a  CPU-bound  process. 


Note  the  general  shape  of  the  curve,  fitted  to  the  data  points.  Our  model  of 

- — -  seems  to  fit.  We  can  then  choose  the  value  of  0max  which  minimizes  the 

l-/VAnax 

mean  square  error  between  the  points  of  the  curve  — —  .  -  and  the  experimentally 

1  ~P  /  Pmax 

measured  points.  The  best  value  for  /3max  was  found  to  be  32.258.  For  efficient 
implementation  of  the  function,  we  simply  used  /?max  =  32. 
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6.2.5.  Input  Trace  Description 

The  trace  files  used  to  generate  input  to  the  simulator  are  derived  from  the  real 
workloads  of  systems  running  under  the  Berkeley  Unix  operating  system.  We  believe 
that  the  driving  of  the  simulator  with  genuinely  real  workloads  was  one  of  the  most 
important  decisions  we  made  when  designing  the  experiments.  This  is  because  load 
balancing  is  concerned  with  the  dynamic  behavior  of  workloads.  A  probabilistic  work¬ 
load  generator,  which  makes  stationary  assumptions  about  the  job  interarrival  time 
distributions,  or  the  service  time  distributions,  might  not  capture  this  behavior 
correctly.  Also,  we  wanted  the  results  of  our  experiments  to  reflect  how  a  real  system 
would  behave;  using  real  inputs  was  one  important  step  in  this  direction.  A  robust 
simulator  is  another  step,  to  be  described  shortly. 

The  traces  reflect  workloads  from  two  types  of  environments:  a  computer  science 
research  one,  and  a  staff-support  one.  The  computer  science  research  environment 
workload  is  primarily  influenced  by  text  formatting,  program  compiling,  and  CPU¬ 
intensive  simulation  jobs.  The  main  components  of  the  staff-support  environment 
workload  are  text  editing  and  formatting,  and  mail  jobs.  Each  trace  represents  one 
full  day  of  job  activity.  The  traces  were  recorded  at  three  different  sites,  U.C.  Berke¬ 
ley,  AT&T  Bell  Laboratories,  and  Bell  Communications  Research,  from  at  least  two 
machines  at  each  site.  The  traces  from  U.C.  Berkeley  were  gathered  at  different  times 
of  the  year.  The  traces  from  the  other  sites  were  gathered  during  summer  months. 

We  divided  each  trace  into  smaller  traces  of  2-hour  periods,  thus  obtaining  a  very 
large  pool  of  trace  files  representing  a  variety  of  different  workloads,  and  capable  of 
creating  a  variety  of  different  load  levels  (different  for  each  machine,  and  -varying 
differently  over  the  time  of  each  machine’s  activity).  In  fact,  each  trace  was  used  to 
drive  a  single-machine  simulation  experiment  to  determine  how  the  number  of  ready 
jobs  varied  over  the  2-hour  time  period.  We  also  computed  an  overall  time-averaged 
value  of  the  number  of  ready  jobs  for  each  2-hour  time  period  (we  call  it  the  average 
load  for  the  trace),  giving  us  an  idea  of  whether  the  trace  produced  an  overall  low  or 
high  load.  This  was  useful  in  the  load  balancing  experiments  for  constructing  non- 
uniform  workload  distributions  over  machines  (e.g.,  high  average  load  traces  on  some 
machines,  low  average  load  traces  on  others)  to  see  the  effects,  if  any,  of  load  balanc¬ 
ing  in  those  conditions.  For  each  experiment,  a  random  assignment  of  traces  to 
machines  was  made,  keeping  constant  the  sum  of  the  average  loads  of  all  the  traces. 

The  relevant  per-job  data  contained  in  the  traces  are  the  job  name,  birth  time, 
total  CPU  usage  (user  and  system  time),  total  elapsed  time,  time-averaged  memory 
usage,  and  number  of  disk  I/O’s  (each  talcing  a  known,  roughly  constant,  amount  of 
time).  The  times  are  recorded  in  discrete  units  of  1/64  second,  except  for  birth  times, 
which  are  in  units  of  1  second  (due  to  record  packing  limitations  in  the  trace  file).  To 
avoid  discretization  effects  of  arrivals  occurring  only  at  1  second  intervals,  a  different 
random  value  between  0  and  63/64  second,  in  units  of  1/64  second,  was  added  to  each 
birth  time. 
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Although  the  environment  was  simulated  and  trace-driven,  that  aspect  of  each 
machine  which  makes  decisions  about  whether  to  offload  jobs  or  not,  which  we  have 
referred  to  simply  as  the  agent,  was  real.  From  the  perspective  of  the  agent,  what  we 
performed  were  trace-driven  simulation- driven  experiments.  What  was  simulated  was 
the  environment  which  provided  inputs  to  each  agent,  and  which  each  agent  could 
affect.  Each  agent  could  be  used  in  a  real  system  without  modification  (except,  of 

course,  for  the  interfaces). 

All  experiments  lasted  for  two  hours  of  simulated  time.  Collection  of  statistics 
began  after  the  first  five  minutes  of  simulated  time  to  minimize  the  effects  of  system 
startup  transients.  Depending  on  the  number  of  machines  in  the  distributed  system, 
the  experiments  took  anywhere  between  20  minutes  (1  machine)  to  6  hours  (30 
machines)  of  real  time  to  execute  on  a  DEC  VAX  8600. 

6.2.6.  Validation  of  the  Simulator 

Simulation  is  a  desirable  experimental  method  because  it  allows  one  to  observe 
and  test  a  system  which  may  not  be  physically  realizable  or  accessible.  For  our  exper¬ 
iments,  we  would  need  a  large  number  of  machines  to  construct  a  distributed  system. 
In  fact,  our  techniques  for  agent-based  decentralized  control  have  greater  significance 
when  there  are  large  numbers  of  machines,  since  global  state  uncertainty  grows  as  the 
number  of  machines  increases.  Simulation  experiments  can  be  repeated  many  times 
at  relatively  low  cost,  and  the  environment  can  be  changed  m  each  experiment  m  a 
controlled  manner. 

Of  course,  a  simulation  only  makes  sense  when  the  simulator  does  in  fact  provide 
a  true  model  of  the  real  environment;  this  is  why  validation  is  not  only  an  important 
part  of  any  modeling  study,  it  is  a  necessary  part.  Only  after  a  complete  and  careful 
validation  can  the  experimenter  have  faith  in  the  results  produced  by  the  simulation. 
Equally  important,  any  description  of  a  simulation  experiment  must  include  the  vali¬ 
dation  procedure  and  its  results. 

We  chose  to  follow  two  separate  validation  procedures.  In  both  procedures  the 
goal  was  to  simulate  the  activities  of  a  single  machine.  Since  we  could  actually 
acquire  the  same  type  of  results  from  the  real  machine,  the  simulated  and  real  results 
could  be  compared  and  evaluated.  Assuming  we  could  rely  on  the  simulated  single 
machine,  then  we  could  replicate  it  and  create  a  large  simulated  distributed  system 
(which  we  do  not  have  a  real  version  of  such  a  system  that  can  be  used  for  such 
experiments  as  these,  and  therefore  could  not  measure  and  compare).  Then,  using 
measurements  of  real  point-to-point  network  transmission  delays,  we  could  model  the 
network  interconnecting  the  simulated  single  machines  as  a  delay,  dependent  on  the 
number  of  packets  and  the  number  of  hops. 

In  the  first  validation  procedure,  we  compared  the  simulated  and  real  elapsed 
times  of  jobs.  The  job  elapsed  time  can  be  considered  as  simply  the  sum  of  its  CPU 
execution  time,  its  I/O  time,  and  its  CPU  queueing  delay.  Since  the  CPU  and  I/O 
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times  are  fixed  by  the  trace,  the  only  variable  which  depends  on  the  dynamics  of  the 
simulation  is  the  CPU  queueing  delay.  Thus,  the  relative  differences  of  the  simulated 
and  real  elapsed  times  gives  us  a  measure  of  how  faithfully  queueing  delays  have  been 
modeled.  This  is  important  since  CPU  queueing  delay  is  a  major  component  of  the 
cumulative  system-dependent  delay  a  job  experiences;  it  is  this  cumulative  dela\ , 
averaged  over  all  jobs,  that  load  balancing  attempts  to  minimize.  (The  other  major 
component  of  cumulative  system-dependent  delay  is  network  transmission  time  of 
offloaded  jobs.  Note  that  it  is  important  to  minimize  the  sum  of  CPU  queueing  delay 
and  network  transmission  delay.  This  may  mean  increasing  network  transmission 
delay  due  to  offloading  a  job  in  order  to  obtain  a  more  substantial  decrease  in  CPU 
queueing  delay,  and  consequently  an  overall  decrease  in  the  cumulative  delay.  This  is 
the  point  of  load  balancing.) 

Define  the  percentage  relative  error  between  simulated  and  real  elapsed  times  as 
follows: 

|  R  _  S  I 

percentage  relative  error  =  100%  x  -* - — 

where  R  =  real  elapsed  time,  and  S  =  simulated  elapsed  time.  Every  time  a  job  com¬ 
pleted,  the  percentage  relative  error  was  computed  and  stored.  Figure  6.4  displays  the 
cumulative  distribution  of  percentage  relative  error. 


Figure  6.4.  Distribution  of  relative  error. 


The  simulated  elapsed  times  of  24%  of  all  jobs  were  perfect,  i.e.,  they  matched 
the  real  elapsed  times  exactly.  84%  of  all  jobs  wrere  simulated  to  within  10%  of  their 
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real  elapsed  times,  and  99%  were  simulated  to  within  50%  of  their  real  elapsed  times. 
The  average  relative  percentage  error  per  job  was  7.69%. 

One  problem  with  the  relative  error  measurement  is  that,  if  a  job  s  real  elapsed 
time  is  a  very  small  value,  say  30  milliseconds,  and  its  simulated  elapsed  time  is,  say, 
15  milliseconds,  the  relative  error  is  50%,  which  would  be  the  same  relative  error  as 
that  for  a  job  whose  real  elapsed  time  is  4  hours  and  its  simulated  time  2  hours. 
Therefore,  we  also  measured  the  absolute  error,  defined  as  follows: 

absolute  error  =  |  R  —  S  | 

where,  again,  R  =  real  elapsed  time  and  S  =  simulated  elapsed  time.  The  average 
absolute  error  per  job  was  157  milliseconds,  and  the  average  job  elapsed  time  was 
23.89  seconds. 

In  the  second  validation  procedure,  wye  constructed  a  controlled  experiment  w’here 
a  purely  CPU-bound  job,  called  the  test  job,  w-as  executed  periodically.  Recall  that 
this  was  already  done  on  the  real  system  to  obtain  constants  for  modeling  system 
overhead.  We  also  "ran"  the  test  job  on  the  simulated  system.  The  test  job  had  a 
predetermined  and  selectable  execution  time  of  either  10  seconds  or  60  seconds,  and 
the  period  between  runs  wras  10  minutes,  so  that  a  test  job  was  never  started  while  a 
previously  started  test  job  was  still  running.  In  both  real  and  simulated  systems,  each 
time  the  job  ran,  its  elapsed  time  and  the  number  of  ready  jobs  averaged  over  the 
elapsed  time  were  recorded.  Note  that  the  elapsed  time  is  the  sum  of  the  execution 
time,  a  fixed  known  value,  and  of  the  time  due  to  system  overhead,  w'hich  includes 
CPU  queueing  time.  Our  hope  was  that  the  elapsed  times  as  functions  of  the  number 
of  ready  jobs  in  both  systems  wrould  turn  out  to  be  close  to  each  other.  This  w  as 
indeed  the  case.  Graphs  for  the  simulated  elapsed-to-execution-time  ratios  and  the 
real  elapsed-to-execution-time  ratios  are  displayed  in  Figure  6.5. 
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Figure  6.5.  Simulated  and  real  elapsed  times. 


In  summary,  we  validated  the  simulator  in  two  ways,  one  showing  the  differences 
in  elapsed  times  on  a  per-job  basis,  the  other  showing  the  differences  of  the  elapsed 
times  of  a  single  test  job  running  over  a  range  of  different  load  levels. 

6.3.  Constants  for  State  Transition  and  Utility  Models 

In  Chapter  5,  we  developed  a  number  of  parameterized  models,  constituting  an 
agent’s  knowledge  for  load  balancing.  These  models  addressed  the  general  problem  of 
load  balancing;  given  our  experimental  system,  wTe  now  provide  the  values  for  the  con¬ 
stants  appearing  in  the  models. 

6.3.1.  Abstract  State  Space 

An  agent  A,’s  abstract  state  indicator  I(z,)  was  defined  as  the  instantaneous 
number  J2t-  of  jobs  ready  for  execution.  This  is  simply  the  total  number  of  jobs 
located  in  the  CPU  server,  wraiting  in  any  of  the  queues,  and  including  the  job 
currently  being  executed  by  the  CPU.  As  this  value  can  change  with  each  clock  tick, 
an  agent  samples  this  value  every  Ts  time  units  to  obtain  the  time  series 
i?t(rz),  i?,(n  —  1),  R{{n- 2),...  A  sampling  period  of 

Ts  =  1/64  second , 
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was  felt  to  be  sufficient  to  retain  the  base  frequency  component  of  the  time  series,  and 
yet  impose  little  overhead  due  to  the  sampling  itself. 

To  remove  high  frequency  components,  the  autoregressive  model  (5.1),  which  we 
repeat  here,  was  used: 

Ri(n)  =  urR,(n  —  l)  +  (1— oj)-R,  (n) . 

The  constant  u>  was  set  to 

w  =  0.96466162. 

Choosing  this  value  for  w,  it  can  be_shown  that  any  single  sample  R_t{n)  will  contri¬ 
bute  less  than  4%  to  R,{n).  Yet,  if  Rt{n)  equals^  some  value  r0,  and  R,(n  +k)  equals  a 
constant  value  r1  for  64  periods  (one  second),  i?,(n-f64)  will  cover  more  than  90%  of 
the  distance  between  r0  and  r:. 

These  two  properties  mean  that 

(1)  transient  values  of  R,(n )  have  a  small  effect  on  Rt(n); 

(2)  R,(n)  tracks  fundamental  components  of  the  time  series  Rt{n)  well  over  relatively 
short  periods  of  time  (i.e.,  one  second). 

We  use  £,(n)  to  compute  B,(n),  the  number  of  ready  jobs,  as  given  by  (5.2): 

B,(n)  =  ROUND(ri{n)). 

This  also  represents  the  agent’s  local  abstract  state 

y,(f)  =  B,(n),  nT^t<(n  +1)  T, 
and  makes  up  the  agent’s  abstract  state  space 

Y,  =  {0,  1,  2,...,  Smax}. 

The  maximum  number  of  ready  jobs  observed  in  our  experiments  was 

•®max  ~  25. 

The  only  constraint  on  £max  is 

■Smix  ^  fimaxi 

where  /?max  is  the  hypothesized  number  of  ready  jobs  which  would  cause  a  machine  to 
spend  all  its  time  in  overhead.  We  saw  in  Section  6.2.4  that  /?max  =  32;  thus,  the  con¬ 
straint  is  satisfied. 

The  upper  half  of  Figure  6.6  illustrates  the  variation  of  the  number  of  ready  jobs 
over  an  interval  of  6.5  minutes  for  a  simulation  driven  by  one  of  our  traces.  The  vari¬ 
ation  is  representative  of  that  of  a  highly  loaded  machine.  A  less  loaded  machine 
would  typically  show  much  less  activity  during  any  six  minutes.  The  lower  half  of 
Figure  6.6  shows  the  fundamental  component  of  the  frequency  of  this  variation,  which 
has  a  period  of  approximately  one  minute.  It  also  shows  the  load  level  L{(n),  about 
which  the  number  of  ready  jobs  varies,  and  the  degree  of  variability  F,(n). 


Number 


degree  of 
variation. 


Number 


fundamental 
component  N 


Time  in  minutes 

Figure  6.6.  Variation  in  the  number  of  ready  jobs. 


For  the  example  in  Figure  6.6,  a  sampling  period  Ts  of  1/64  second,  to  produce 
samples  which  are  then  time-averaged  using  the  autoregressive  model  (5.2)  to  yield 
values  of  the  number  of  ready  jobs  spaced  one  second  apart,  is  sufficient  for  capturing 
the  fundamental  frequency  component  of  Rx. 

6.3.2.  Load  Level  and  Degree  of  Variability 

To  obtain  the  load  level  L,(n),  we  use  a  moving  average  of  the  number  of  ready 
jobs,  B,(n),  as  given  by  (5.4): 
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Nl 

MAL(n)  =  £  wkBi(n-k), 
k  =0 


with,  in  our  case, 


Nl  =  60. 

Thus,  the  number  of  ready  jobs  is  averaged  over  the  past  one-minute  period.  The 
weights  w*  were  all  set  to  1/60,  thereby  uniformly  accounting  for  each  sample  over 
the  past  minute: 

ujl  =  — .  0  ^  Ik  <  60. 

60  ' 


Using  MAL(n),  the  load  level  I,(n)  is  given  by  (5.6): 

ROUND(MAL(n),  HL)  \i  MAL{n)  >  I,(n-1)  +  HL/2  +  h 
Lt{n)  =  -  ROUND{MAL{n),  HL)  if  MAL{n )  <  L,{n- 1)  -  HL/2  -  h 
L{[n- 1)  otherwise. 

with  the  number  of  ready  jobs  considered  a  significant  change  in  load  being 

Hl  =  4. 


Notice  in  Figure  6.6  that,  in  fact,  the  load  level  changes  with  an  average  period 
of  just  more  than  one  minute.  And  yet,  it  does  represent  a  long-term  average  of  the 
number  of  ready  jobs. 

The  moving  average  of  the  absolute  differences  between  the  number  of  ready  jobs 
and  the  load  level,  MAv(n),  is  given  by  (5.5): 

AV 

MAv{n )  =  Bi(n~k)  ~  Lj(n-k)  \  . 

k  =0 


with,  in  our  case, 


Ny  =  60. 


The  weights  Wj.'  were  all  set  to  1/60,  as  we  did  for  weights 


“k' 


— .  0  ^  ib  <  60. 
60 


Using  MAy(n),  the  degree  of  variability  U,(n)  is  given  by  (5.5): 

ROUND{MAv{n ),  Hv)  if  MAv{n)  >  U,(n-l)  +  Hy/2  +  h 

U,(n)  =•  ROUND(MAv{n),  Hv)  if  MAv{n)  <  V,-(n-l)  -  Hv/2  -  h 
U,(n  — l)  otherwise. 

writh  Hy,  the  distance  between  the  differences  considered  a  significant  change  in 
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variation  about  the  load  level  being 

Hy  =  1. 


6.3.3.  State  Transition  Probability  Matrix 

We  saw  in  Section  5.4.2  that  the  parametric  model  for  one-step  state  transitions, 
Pv,  is  a  13 max  X  Smax  matrix  given  by  (5.9): 


Pv 


(1  +  Pt>) 

2 

(j-Pv) 

2 

0 

0 


(1  -Pv) 
2 

Pv 

(1  -Pv) 
2 

0 


0  0 


(1-P,) 

2 


Pv 


(1  -Pv) 
2 


0 

(1-Pj  . 
2 

Pv 


0 

0 

0 

0 


0 

0 

0 

0 


0 

0 


0 

0 


0 

0 


0 

0 


Pv 

(1  -Pv) 
2 


U-pJ 

2 

(1+P«J 

o 


Recall  that  pv  is  a  decreasing  function  of  V,  as  given  by  (5.10): 

dpv 


We  now  explicitly  define  this  relationship,  on  the  basis  of  empirical  evidence. 
We  conducted  approximately  300  single-machine  experiments,  each  driven  by  a 
different  two-hour  trace  file.  For  each  experiment,  three  time-series  were  generated: 

(1)  B{(n ),  the  time-series  for  the  number  of  ready  jobs  for  trace 

(2)  L,(n),  the  time-series  for  the  load  level  for  trace  z; 

(3)  Vt(n),  the  time-series  for  the  degree  of  variability  for  trace  z. 

We  then  divided  R,(n),  for  each  trace  z’,  into  a  number  of  smaller  time-series 

[B,(m,n)]Av, 

where  m  and  n  define  an  interval  for  a  sequence  of  numbers  of  ready  jobs, 

5,(m),  Bt(m  +1),  •  •  •  ,  5,(n), 

such  that  Li[k)  =  A,  and  V{{k)  =  v,  for  m^k^n.  Thus,  over  the  interval  [m,n],  the 
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load  level  and  the  degree  of  variability  remain  constant. 

The  reason  for  doing  this  is  the  assumption  that  the  conditional  distribution 
p(Bj(n)  |  B-(n-m))  is  stationary  if  Lt{k)  and  V,(k)  are  constant  for  k£[m,n].  (In  fact, 
under  these  conditions  the  distribution  is  considered  second-order  or  weakly  stationary 
[Chat85]  by  definition,  since  second-order  stationarity  implies  that  the  mean  and  the 
variance  of  (B,{n)  |  B,{n-m ))  are  constant.  Lx{k)  and  Vt{k)  are  measures  of  the  mean 
and  variance,  respectively). 

For  each  time-series  [■B1-(m,n)]-\tM  we  generated  a  three-dimensional  frequency 
table  fXvl{x,y,z),  where  the  (z,y,z)  entry  indicates  the  number  of  times  B,{k)  =  x,  given 
Bt{k-z)  =  y ,  for  i,ye{0,l,...,5max},  and  m^z^k^n.  Tables  generated  from  time- 
series  which  had  the  same  load  level  and  degree  of  variability  were  then  combined 
(additively)  to  produce  a  set  of  frequency  tables 

F\v{x,y,z)  =  Yf,f\v,{x,y,z) 

X 

From  these  frequency  tables,  we  verified  that  the  form  of  the  state  transition  probabil¬ 
ity  matrix  Pv  had  a  close  correspondence;  the  only  remaining  task  was  to  determine 
Pv  =  y(  Vf).  In  our  experiments,  we  observed  that  V,  took  on  only  five  possible  values. 

Vi  £  {0,  1,  2,  3,  4}. 

For  each  of  these  values,  we  then  found  a  value  for  pv  which  provided  a  best  fit  (using 
minimal  mean-square  error)  to  the  distributions  defined  by  the  frequency  tables. 
These  were  found  to  be: 

pQ  =  0.993,  pl  =  0.980,  p2  =  0.961,  pz  =  0.934,  p4  =  0.901. 

With  this,  the  one-step  state  transition  matrix  Pv  is  completely  defined,  assum¬ 
ing  that  Vi(n)  for  any  single  machine  in  the  load  balancing  experiments  would  not 
contain  values  greater  than  4.  (This  was  indeed  the  case.) 

Figures  6.7,  6.8,  and  6.9,  illustrate  the  effect  of  the  different  degrees  of  variability 
and  of  information  aging,  on  the  state  transition  probabilities 

p(Bj(n )  =  0  |  Lj(n-aj,)  =  A,  V:(n-a}i)-v),  0^0^25. 

Each  figure  shows  a  family  of  state  transition  distributions,  for  a  given  load  level  A 
and  degree  of  variability  v.  For  each  figure,  the  load  level  A  was  set  to  4.  The  degree 
of  variability  is  different  for  each  figure:  in  Figure  6.7,  u=0,  —  .993 ;  in  Figure  6.8, 

v=2,  /?2  =  .961;  for  Figure  6.9,  u=4,  p4  =  .901.  Finally,  wdthin  each  figure,  distributions 
are  shown  for  a  number  of  ages  of  information  ranging  betwreen  0  and  120  seconds, 
specifically,  a  £  {0,  10,  30,  60,  120}. 
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Number  of  Ready  Jobs 


Figure  6.9.  Distributions  with  A  =  4,  v  =  4. 


There  are  two  specific  observations  to  be  made  about  these  state  transition  distri¬ 
bution  families.  The  first  observation  is  that,  within  one  family  of  distributions,  the 
width  of  the  spread  about  the  load  level  increases  as  information  age  increases.  This  is 
particularly  noticeable  when  one  compares  the  distribution  for  age  0  with  the  distribu¬ 
tion  for  age  120  in  any  family;  the  distribution  for  age  0  has  all  its  mass  at  the  load 
level,  whereas  the  distribution  for  age  120  has  its  mass  spread  wddely  about  the  load 
level. 

The  second  observation  is  that,  across  families  of  distributions,  the  rate  at  w’hich 
the  spread  widens  about  the  load  level  increases  with  the  degree  of  variability  v.  This 
can  be  seen  most  evidently  by  comparing  the  family  of  distributions  wdth  v  —  0  to  the 
family  of  distributions  with  v  =  4:  the  width  of  the  spread  increases  with  information 
age  much  more  rapidly  for  the  latter. 


6.3.4.  Utility  Models 

The  local  state  utility  of  an  agent  (see  Section  5.5.1)  is  given  by  (5.15): 


u,(0)  =  p(0)  = 


-0 

1  “  /VAnax 


M{6)  =  0. 


Since  we  have  determined  that  0max  =  32,  we  know  the  relationship  between  the 
number  of  ready  jobs  /?,  and  the  utility.  Note  that  the  domain  of  fi{0)  is  the  set  of 
integers  {0,  1,  2,...,  Bmax},  where,  as  we  have  also  determined,  £max  =  25. 
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An  agent  also  needs  to  determine  the  future  state  utility  of  a  remote  agent  if  a 
job  w  is  offloaded  to  it.  This  conditional  utility  of  agent  Ay,  from  A,’s  viewpoint, 
given  that  job  u>  will  be  offloaded  form  A,  to  Ay,  is  given  by  (5.19): 

I  r)ij(w)  +  //(/?+l),  9  <  -Smaz,  M{6)  =  9 
-oo  9=  Bmax 

To  compute  uy,-(l9,  w),  we  need  to  specify  how  A,  can  compute 

nttdtlay:Aw ) 

=  '  elap(w,  0)  ' 

To  compute  netdelay^(w),  agent  A,  needs  to  know: 

(1)  the  average  packet  transmission  rate  between  itself  and  Ay; 

(2)  the  size  of  job  w. 

As  mentioned  earlier,  the  network  topology  wras  randomly  generated  for  each 
experiment,  with  the  constraint  that  the  average  number  of  neighbors  of  a  machine 
would  be  3.  To  get  a  rough  idea  of  the  distribution  of  the  number  of  machines  within 
a  given  distance  from  a  given  agent,  we  consider  a  planar  graph  with  an  infinite 
number  of  nodes,  wrhere  each  node  represents  a  machine,  and  is  adjacent  to  three 
other  nodes,  and  analyze  the  number  of  nodes  within  a  small  neighborhood  of  a  gi'ven 
node. 
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Let 

f  ( d )  =  the  number  of  nodes  within  distance  d  of  the  given  node. 

f{d)  must  be  at  least  as  large  as  f(d- 1),  the  number  of  nodes  within  distance  d- 1. 
And,  each  node  at  distance  d- 1,  of  which  there  are  /  (cf-1)  -  /  {d- 2),  is  adjacent  to 
2  nodes  which  are  at  distance  d  from  the  given  node.  Thus,  we  have  the  following 
recurrence  relation: 

f(d)  =  f{d-l)+2{f{d-l)-f{d-2)),  /( 0)  =  0,  /( 1)=3. 

Solving  it,  we  get 

f{d)  =  3{2d  -  1),  d^ 0. 

Therefore,  we  see  that,  with  a  unit  increase  in  the  distance  from  the  given  agent,  the 
number  of  agents  it  has  to  communicate  with  roughly  doubles. 

Returning  to  our  experiments,  each  agent  A,  was  provided  with  knowledge  about 
the  distance  in  hops,  d ,-y,  to  every  other  agent  Ay,  and  the  average  packet  transmission 
rate  between  nodes,  which  we  set  at  64  kilobytes  per  second.  (Similar  estimates  were 
observed  by  [Cabr88]  for  Berkeley  Unix  systems  using  ARPANET  protocols. 
Although  this  is  a  low  bandwidth  considering  the  high-speed  networks  a\  ailable 
today,  we  chose  it  to  accentuate  the  consequences  of  bad  decisions.)  Therefore,  if  an 
agent  knows  that  the  size  of  a  job  is  sz(ttf)  (in  kilobytes),  it  can  determine  how  long  it 
will  take  to  transmit  it  to  any  other  agent  by  computing, 

sz(w)'d,j 

netdelaytAw )  =  ■  r 

J  64  kbyte/s 

Although  the  trace  file  did  not  contain  the  actual  job  size  and  the  sizes  of  the 
data  files  which  would  also  have  to  be  shipped  (assuming  files  are  not  replicated  across 
multiple  machines),  it  did  contain  the  number  of  disk  I/O  s  a  job  requested,  and  its 
average  memory  usage.  From  this  information,  we  computed  an  approximate  job 
transmission  size  sz[w).  In  general,  there  is  no  reason  why  an  agent  could  not  know 
exactly  the  total  transmission  size  of  a  job,  assuming  all  code  and  data  files  are  expli¬ 
citly  identified.  The  unavailability  of  the  actual  job  size  was  a  problem  for  us  purely 
because  that  size  happened  to  not  be  recorded  in  the  trace. 

To  compute  rjij(w),  an  agent  also  needs  to  know  job  w  s  execution  time, 
elap(w,  0).  In  contrast  to  the  job  size,  this  information  is  generally  not  available  prior 
to  the  execution  of  the  job,  and  yet,  it  is  necessary  for  estimating  the  conditional  util¬ 
ity,  a  fundamental  quantity  needed  for  rational  decisionmaking. 

We  dealt  with  this  problem  by  recognizing  that,  when  a  job  w  arrives,  it  is  often 
the  case  that  w  has  been  executed  in  the  past.  If  this  is  the  case,  an  estimate  can  be 
made  about  its  execution  time,  based  on  its  past  behavior.  This  is  another  example  of 
special-purpose  knowledge  an  agent  would  have  for  load  balancing. 
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A  good  indicator  for  identifying  the  same  job  over  a  number  of  executions  is  its 
name,  denoted  by  n(w),  which  was  recorded  in  the  traces.  (As  the  name  of  a  job  is 
simply  the  name  of  the  file  containing  the  job's  executable  code,  it  certainly  is  possible 
to  have  different  jobs  with  the  same  name,  since  file  names  can  be  modified  over  time. 
Jobs  which  execute  frequently,  however,  generally  retain  the  same  name.) 

An  agent  keeps  a  list  of  the  names  of  jobs  that  have  executed  in  the  past.  As 
this  information  is  shared  periodically  between  agents,  the  jobs  could  have  executed 
on  any  machine  (and  this  information  is  valid  for  all  machines  since  the  machines  are 
homogeneous  in  our  experiments).  For  each  job  name,  an  agent  keeps  track  of  the 
number  of  times  it  has  executed,  plus  the  mean  m(n(w))  and  the  coefficient  of  varia¬ 
tion  c(n(u>))  of  the  past  execution  times.  The  mean  m(n(w))  is  used  to  construct  an 
estimator  for  elap(w ,  0),  and  the  coefficient  of  variation  c  (n(t»))  provides  a  measure  of 
the  reliability  of  m(n(tt>)).  Agents  also  keep  track  of  the  overall  mean  execution  time 
of  all  jobs,  m  (IF).  To  estimate  elap(w,  0),  agents  use  the  following  formula: 

estimate  (elap(w,  0))  =  we^n^w^m  (n(w))  +  (1— (w),  0  <  u>  <  1. 

The  estimate  of  elap(w,  0)  is  a  weighted  sum  of  the  mean  execution  time  of  jobs  with 
name  n(w),  and  the  mean  execution  time  of  all  jobs.  When  the  coefficient  of  varia¬ 
tion  c(n(w))  is  small,  m(n(w))  is  emphasized;  w'hen  c(n(u>))  is  large,  m(w)  is 
emphasized.  The  best  value  for  u  was  found  by  executing  a  large  number  of  jobs, 
recording  the  actual  job  execution  times,  and  then  minimizing  the  sum  of  squared 
errors 

YJ\rtal{elap{w,  0))  -  estimate  (elap (w,  0))}2 . 

W 

The  number  of  entries  (one  per  job  name)  in  such  a  list  can  become  very  large  for 
each  agent.  Assuming  that  availability  of  memory  is  not  a  problem,  agents  are  still 
faced  with  the  potential  problem  of  highly  time-consuming  lookup  times.  To  coun¬ 
teract  this,  agents,  in  actuality,  use  two  data  structures  for  maintaining  job  informa¬ 
tion:  a  balanced  binary  tree  for  information  lookup,  and  a  simple  unordered  list  for 
information  recording.  The  binary  tree  allows  for  rapid  lookup  (requiring  0{log  n) 
comparisons,  where  n  is  the  number  of  job  name  records),  and  the  unordered  list 
allows  for  rapid  recording  of  job  information.  When  an  agent  recognizes  that  the 
machine  on  which  it  resides  has  no  jobs  to  execute  (i.e.,  it  has  spare  time),  it  removes 
a  job  information  record  from  the  unordered  list,  places  it  into  the  binary  tree,  and 
does  the  balancing.  (Note  that  if  a  job  record  with  the  same  name  already  exists  in 
the  tree,  only  a  recomputation  of  the  mean  and  coefficient  of  variation  is  necessary.) 
This  repeats  until  there  are  no  more  entries  in  the  unordered  list,  or  until  a  new  job 
arrives.  When  a  new  job  arrives,  the  binary  tree  data  structure  must  be  left  in  a 
stable  state;  thus,  the  job  record  currently  being  placed  into  the  tree  is  completed,  the 
tree  is  balanced,  and  the  new  job  can  then  begin  execution. 
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Job 

Exec. 

Name 

Time 

troff 

125  sec 

cc 

19  sec 

cp 

.1  sec 

mail 

.8  sec 

troff 

93  sec 

Unordered  List 


Balanced  Binary  Tree 


Figure  6.11.  Job  information  data  structures. 


We  make  some  final  observations  about  an  agent  s  management  of  job  informa¬ 
tion.  Job  information  can  be  given  to  an  agent  when  the  agent  is  created.  The  agent 
can  add  more  information  dynamically  by  observing  jobs  executing  on  its  machine. 
Agents  can  learn  from  each  other  by  sharing  information  generated  on  their  respective 
machines.  Finally,  a  human  can,  at  any  time,  provide  additional  job  information  to 
any  agent,  which  treats  this  like  any  other  observation,  first  placing  it  into  its  unor¬ 
dered  list,  and  eventually  into  the  binary  tree. 

As  for  selecting  the  optimal  time  for  transferring  job  information  from  an  unor¬ 
dered  list  to  a  binary  tree,  agents  may  take  advantage  of  special-case  knowledge  about 
how  the  load  varies  over  time,  namely  that  there  will  be  (often  predictable)  periods  of 
time  (e.g.,  between  3:00  AM  and  6:00  AM)  when  little  or  no  work  is  expected,  and 
thus  time  can  be  spent  reorganizing  information  so  that  performance  during  future 
periods  of  high  load  will  be  improved.  (An  agent’s  objective  function  could  be 
modified  so  that  the  expected  future  utility  is  increased  by  reserving  a  period  of  time 
during  which  no  work  is  accepted,  and  information  reorganization  can  take  place. 
Perhaps  humans  have  some  similar  mechanism  for  inducing  sleep.) 

In  summary,  we  have  shown  how  an  agent  obtains  estimates  for  netdelay,j{w ) 
and  elap(w,  0),  so  that  it  can  compute  »7,-y(u>).  This,  along  with  knowledge  of  the 
function  //(/?),  is  necessary  in  computing  a  remote  agent's  conditional  state  utility . 
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Since  the  state  transition  probability  matrix  Pv  has  also  been  explicitly  specified, 
an  agent  At  can  compute  the  conditional  expected  utility  (see  Section  5.5.4)  of  remote 
agent  Aj  using  (5.31). 

Let 

B  max  —  1 

[^(A,?;,^)  =  2  //(/?+l)‘[P°  ]  A/3  +  At(-®max  )  '\P  v  J  ABmax  • 

Ui(\.v,a)  is  simply  the  conditional  expected  state  utility  of  a  remote  agent  assuming 
that  a  job  is  offloaded  there,  ignoring  the  network  delay,  and  that  the  local  agent 
(which  is  computing  the  utility)  knows  the  remote  agent’s  load  level  A,  its  degree  of 
variability  v,  and  the  age  a  of  the  information  itself.  Since  our  agents  are  homogene¬ 
ous,  U\  (A ,v,a)  is  a  valid  utility  model  for  all  of  them.  (Ui  (A,u,a)  seems  expensive  to 
compute  in  real  time,  since  it  requires  a  large  number  of  matrix  multiplications;  we 
will  address  this  problem  shortly.) 

We  saw  earlier  how  the  state  transition  probabilities  varied  with  the  age  of  infor¬ 
mation  (this  generated  a  family  of  distributions).  We  now  consider  how  the  condi¬ 
tional  expected  utility  varies  with  aging  information.  It  is  only  necessary  to  analyze 
Ul(\,v,a),  as  T)ij(w)  is  independent  of  information  age. 

Figures  6.12  and  6.13  each  show  a  family  of  curves  for  Ui(\,v,a),  where  in  Fig¬ 
ure  6.12,  the  load  level  is  fixed  at  A  =  0,  while  in  Figure  6.13  the  load  level  is  fixed  at 
A  =  4.  Within  each  family,  each  curve  corresponds  to  a  different  value  of  u£{0,2,4}, 
and  Ui(A,v,a)  is  varied  with  respect  to  information  age  a.  Notice  how  the  utility 
decreases  as  information  ages  for  these  relatively  low  load  levels. 
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Consider  the  situation  where  an  agent  A,  has  information  about  two  remote 
agents  Aj  and  Ak.  A,-’s  information  about  Ay  is  that  Ay's  load  level  is  0,  and  that  the 
degree  of  variability  is  4.  A,’s  information  about  Ak  is  that  Ak  s  load  level  is  4  and 
the  degree  of  variability  is  0.  Information  about  each  has  the  same  age  a.  The  ques¬ 
tion  is:  which  of  the  remote  agents  has  the  highest  expected  state  utility  if  a  job  is 
offloaded  there,  ignoring  network  delays? 

Since  the  Ay’s  load  level  is  lower  than  that  of  A*,  it  would  seem  that  Ay  should 
have  a  better  expected  state  utility.  But  the  information  concerning  Ay  has  a  higher 
degree  of  variability  than  that  of  Ak\  therefore,  aS  the  information  ages,  the  decision¬ 
making  agent  A,  becomes  more  certain  about  Ak  s  state  than  it  is  about  Ay  s  state. 
This  is  illustrated  in  Figure  6.14. 
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age  in  minutes 


Figure  6.14.  Utility  curve  comparison. 


According  to  this  figure,  Ay  has  a  higher  expected  state  utility  when  the  informa 
tion  is  less  than  approximately  4.5  minutes  old;  after  that,  Ak  has  a  higher  expected 
state  utility.  This  illustrates  why  it  is  necessary  to  base  decisions  on  a  comparison  of 
expected  utilities,  which  takes  uncertainty  of  information  into  account,  rather  than  on 
a  simple  comparison  of  agent  states  (which  in  this  case  would  be  information  about 
load  levels),  which  ignores  information  uncertainty. 

These  examples  have  ignored  network  delay,  whose  effect  on  state  utility,  for  a 
given  job  w,  is  the  quantity  ijij(w).  Based  on  (5.31),  conditional  expected  utility  is 

given  by 

E[uji(Bj(n),w)  |  Xj,Vj,aji]  =  r]ij{w)  -  (A;,t>y,ay,) 

The  effect  of  Vij{w)  on  tiie  curves  reported  in  Figures  6.12,  6.13,  and  6.14  is  to 
translate  them  downward  (since  r?,;(w)  is  always  negative).  We  shall  illustrate  this  by 
extending  our  previous  example  about  the  decisionmaking  agent  .4,  and  the  remote 
agents  Ay  and  Ak.  Suppose  that  the  distance  from  A,  to  Ay  is  10  hops  (a  very  large 
distance),  and  the  distance  to  Ak  is  1  hop.  Assume  estimate  (elap(w,0))  —  20  seconds, 
where  w  is  the  job  to  be  possibly  offloaded  by  A,,  and  S2(u>)  =  700  kbytes.  Since 

sz(tz>)  •  dij/ 64  kbyte  /sec 
estimate  (elap(w.O)) 

then,  for  agent  Ay,  we  have 
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and,  for  agent  A*, 


700-1/64 

20 


=  -.55 


The  utility  curves,  assuming  that  A/s  reported  load  level  is  0  and  its  degree  of  varia¬ 
bility  is  4,  and  that  A/s  reported  load  level  is  4  and  its  degree  of  variability  is  0,  are 
shown  in  Figure  6.15. 


Figure  6.15.  Effect  of  network  delay  on  utility. 


Ay’s  utility  curve  is  translated  downward  by  5.47  units,  and  A*  s  utility  curve  is 
translated  downward  by  .55  units.  Thus,  when  A,  accounts  for  network  delay,  A*  s 
expected  state  utility  is  always  better  than  Ay  's,  regardless  of  the  age  of  information. 

6.3.5.  Efficient  Utility  Computations 

The  function  Ui(\,v,a)  requires  a  number  of  matrix  multiplications,  a  sum  of 
scalar  multiplications,  and  other  operations,  which  can  be  time-consuming.  To  avoid 
this,  agents  are  given  a  three  dimensional  table  Ui(X,v,a)  of  precomputed  values. 

How  large  is  this  table?  For  our  experiments,  we  said  that  the  load  level  can 
take  on  values  from 

A  €  {0,  4,  8,  12,  16,  20,  24}. 

The  degree  of  variability  can  take  on  values  from 

v  e  {0,  1,  2,  3,  4}. 

If  we  measure  c,  the  age  of  the  state  information,  in  seconds  for  up  to  five  minutes 
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(this  is  considered  a  long  time),  then 

a  e  {1,  2,  3,  •  •  •  ,300}. 

Therefore,  the  size  of  the  I7j(A,u,a)  table  is  7x5x300  =  10,500  entries.  Each  entry 
must  represent  a  value  for  utility,  which  ranges  from  -115  to  0  (this  comes  from 
p{0),  0^/3^25),  and  therefore  can  consist  of  a  single  byte.  Thus,  our  table  uses 
approximately  ten  kilobytes  of  memory. 

An  agent  can  efficiently  compute  a  conditional  expected  utility  by  simply  deter¬ 
mining  r)jj(w),  looking  up  t/i(A,u,a),  and  adding  them  together.  The  most  time  con¬ 
suming  part  is  computing  which  requires  a  search  through  a  balanced  binary 

tree  to  obtain  elap(w,0).  However,  even  if  the  tree  contains  1000  job  information 
records  (representing  the  most  common  jobs),  the  search  will  require  at  most  10  com¬ 
parisons.  In  fact,  an  agent  A,  will  typically  compute  pt;(w)  for  a  number  of  agents 
A;,  but  they  will  all  use  the  same  value  for  elap(w,  0);  thus,  the  10  comparisons  are 
really  amortized  over  multiple  conditional  expected  utility  computations. 

6.3.6.  Efficient  Payoff  Computations 

In  Section  5.6.1,  we  defined  the  payoff  A(6vw)  of  the  decision  6}  (which 
represented  the  "offload  to  Ay"  decision)  as  (see  (5.34)) 

£[uyt-(yy(0,N)  !  Kjiit)}  A  E[uti{yt(t),w)  \  K}l{t)]  +  £  £[■“*( V*(0)  I  *«(*)]• 

Theoretically,  an  agent  A,  would  compute  A(£y,u>)  for  all  j,  select  those  of  its  values 
which  are  positive,  and  make  a  final  selection  using  the  space/time  randomization 
technique  described  in  Section  5.6.2.  L nfortunately ,  this  is  an  extremely  time- 
consuming  procedure  as  just  described. 

Let  us  carefully  consider  what  computations  must  take  place.  We  already  know 
that  E[uji(yj(t),w)  |  K]X{t))  can  be  computed  quickly  by  adding  U1(XJ.Uj,aJl)  and 

v  ijM- 

E[utl(yt(t),w)  |  is  given  by  (5.33),  which  is  equivalent  to 

+  [P^)x M(y,(0)  =  fa- 

In  general,  an  agent  will  only  consider  offloading  a  job  to  a  remote  agent  ■whose  load 
level  is  low.  Given  an  agent  Aj  with  small  Ay,  the  probability  that  Ay  will  be  in  state 
Smax,  [Evi']xj0i  ^*e  veiT  sma^-  Therefore,  £[u,-,-(yt-(t),t0)  |  can  be  approxi¬ 

mated  by  fi[0i). 

We  are  then  left  with  the  sum  of  expectations 

£  E[uk[yk[t))  |  Kkl{t)]. 
k~h  J 

This  is  simply  a  sum  of  expected  utilities,  given  by  (see  (5.24)) 
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E\uk{Bk{n))  ]  At,vJk,att]=  E  W 

&=  o 

Rather  than  computing  this  expectation,  we  can  create  a  table  of  precomputed  values 
similar  to  that  for  Ux  (A,t>,a).  Therefore,  let 

An  ax 

U0{X,v,a)  =  £  p{PYIKU 
0=0 

be  the  table  of  expected  state  utilities  of  an  agent  whose  load  level  is  A,  whose  degree 
of  variability  is  u,  and  whose  information's  age  is  a. 

Let 

Ar 

Ur  =  ^ 

k  =  l,k^i 

Then, 

A($y,u;)  ~  U1(\j,Vj,aji)  +  p{0t)  +  (  Ur  -  U0(XJ,vra;it)  ). 

This  formula  is  much  better  in  terms  of  efficiency  of  computation  than  the  original 
formula,  except  that  U$  still  is  a  potentially  very  large  sum.  If  we  consider  that  the 
purpose  of  computing  A(<5y,ic)  for  all  j  is  eventually  to  compare  them  and  select  the 
best  ones  (i.e.,  those  that  are  better  than  the  payoff  of  making  the  null  decision),  then 
since  U-%  is  a  constant  factor  added  to  each  payoff,  it  can  be  dropped  out  of  the  calcu¬ 
lation!  Consequently,  we  define  the  relative  payoff  as 

A  (Sj,w)  =  Ul(Xj,Vj,aji)  +  p(0t)  -  U0(Xj,Vj,  fly,)- 

This  requires  three  table  lookups,  one  addition,  and  one  subtraction. 

Finally,  how  many  payoff  computations  must  an  agent  make  in  order  to  arrive  at 
a  load  balancing  decision?  The  agent  is  only  interested  in  remote  agents  which  are 
lightly  loaded.  Furthermore,  it  need  not  necessarily  find  the  least  loaded  agent,  since 
it  will  randomize  its  decision  over  a  number  of  underloaded  agents  anyway.  Thus,  by 
grouping  agents  into  underloaded  and  not-underloaded  groups,  and  then  randomly 
selecting  a  small  number  of  the  underloaded  ones  (w7e  used  8  in  our  experiments),  the 
payoffs  for  offloading  jobs  to  these  agents  can  be  determined,  and  a  randomized  deci¬ 
sion  can  then  be  made.  Thus,  only  a  small  constant  number  of  payoff  computations, 
which  are  themselves  very  simple,  need  to  be  made  (and  not  one  payoff  computation 
per  remote  agent). 

6.4.  Experimental  Results 

To  evaluate  our  methods,  we  conducted  a  number  of  experiments  for  a  compara¬ 
tive  study  of  job  placement  strategies,  including  our  own  agent-based  strategy.  The 
strategies  differ  in  their  costs  for  state  information  communication,  in  their  costs  for 
decision  procedure  computations,  in  their  job  transfer  costs,  and  in  the  degree  to 
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which  they  minimize  average  job  elapsed  time,  which  is  the  performance  index  to  be 

optimized. 

6.4.1.  Types  of  Job  Placement  Strategies 

The  five  job  placement  strategies  compared  in  this  study  are: 

(1)  Perfect  Information  [P] :  load  balancing  decisions  are  made  with  perfect 
knowledge  of  the  global  system  state.  When  a  new  job  arrives  at  a  machine,  a 
job  placement  decision  is  made  which  takes  into  account  the  load  on  every 
remote  machine,  the  job  transfer  time,  and  the  job’s  expected  execution  time.  In 
a  sense,  this  is  the  best  decision  which  can  be  made  given  complete  certainty 
about  the  global  system  state,  but  not  about  concurrent  decisions  being  made  by 
other  machines.  Therefore,  this  strategy  is  immune  to  the  first  problem  of  decen¬ 
tralized  control  (see  Section  3.4),  but  not  to  the  second. 

(2)  Periodic  Update  [U]:  load  balancing  decisions  are  made  on  the  basis  of  aging 
information  about  the  global  system  state.  Machines  broadcast  their  state  to  all 
other  machines  on  a  fixed  periodic  basis.  When  a  new  job  arrives  at  a  machine,  a 
job  placement  decision  is  made  which  takes  into  account  the  load  on  every 
remote  machine  (based  on  imperfect  information),  the  job  transfer  time,  and  the 
job’s  expected  execution  time.  The  update  period  is  the  same  for  every  machine, 
a  number  of  experiments  were  carried  out  with  periods  varying  from  1  second  to 
5  minutes.  The  actual  periods  used  were  1,  2,  4,  8,  15,  30,  60,  120,  180,  and  300 
seconds.  This  strategy  is  subject  to  the  two  fundamental  problems  of  decentral¬ 
ized  control,  and  makes  no  attempt  to  combat  them  except  by’  periodic  refreshing 
of  information. 

(3)  Intelligent  Agents  [I]:  load  balancing  decisions  are  based  on  the  principles  and 
techniques  described  in  this  dissertation.  State  information  is  updated  on  the 
basis  of  our  analysis  of  the  tradeoffs  between  communication  overhead  and  degra¬ 
dation  in  decision  quality  due  to  aging  information.  Decisions  account  for  uncer¬ 
tainty  of  information,  and  expected  payoffs  based  on  conditional  expected  utility  . 
Space/time  randomization  is  used  to  avoid  resonances.  This  strategy  is  subject 
to  the  two  fundamental  problems  of  decentralized  control,  and  combats  them  in 
the  best  ways  we  believe  are  possible. 

(4)  No  Load  Balancing  [B] :  this  strategy  does  not  do  any  load  balancing;  i.e.,  jobs 
always  execute  locally7.  This  represents  the  baseline  strategy7,  comparing  the 
other  methods  to  the  baseline  strategy  tells  us  whether  they  are  better  than 
essentially  doing  nothing. 

(5)  Random  Decisions  [R]:  the  random  decision  strategy7  is  to  offload  a  job  to  a 
randomly  selected  remote  machine  if  the  local  number  of  ready  jobs  is  above  a 
fixed  threshold.  The  threshold  we  used  was  3,  and  was  determined  experimen¬ 
tally  to  be  optimal  (i.e.,  to  minimize  average  job  response  time)  under  random 
decisionmaking  in  our  experimental  context.  Comparing  other  methods  to  this 
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strategy  tells  us  whether  more  complex  decisionmaking  schemes  are  better  than 
making  purely  random  decisions. 

6.4.2.  Accounting  for  Costs 

Each  of  the  strategies  have  different  costs.  W  ith  respect  to  state  information 
communication  costs:  [B],  [P],  and  [R]  are  subject  to  none;  [U]  is  subject  to  these 
costs,  and  they  can  be  very  significant  for  small  update  periods;  [I]  is  subject  to  these 
costs,  but  explicitly  tries  to  keep  them  low. 

With  respect  to  job  transfer  costs,  only  strategy  [B]  is  not  subject  to  them  since  it 
does  not  do  any  load  balancing.  For  the  other  strategies  ( [P ] ,  [U],  [I],  and  [R]),  the 
job  transfer  costs  are  the  same. 

With  respect  to  decision  procedure  computation  costs,  strategy  [B]  is  not  subject 
to  them  because  it  does  not  do  any  load  balancing,  and  strategy  [R]  is  not  subject  to 
them  because  of  the  simplicity  of  the  random  selection  decision.  Strategies  [P]  and 
[U]  are  subject  to  these  costs,  and  have  the  same  costs  because  they  use  essentially  the 
same  decision  procedure;  their  procedures  only  differ  in  the  quality  of  the  information 
on  which  they  base  their  decisions.  Finally,  strategy  [I]  has  the  most  complex,  and 
consequently  the  most  costly,  decision  procedure,  even  though  we  will  show  that  the 
cost  is  not  really  significant.  Figure  6.16  displays  a  3-dimensional  cost  space,  with 
points  for  each  strategy  qualitatively  representing  their  approximate  relative  costs. 
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Figure  6.16.  Location  of  strategies  in  cost  space. 


These  costs  are  explicitly  accounted  for  in  our  experiments  by  simulating  com¬ 
munication,  job  transfer,  and  decision  procedure  computation  times.  As  already 
described  in  Section  6.1.4,  communication  and  job  transfer  times  are  based  on  the 
amount  of  data  to  be  sent,  the  distance  between  the  sender  and  the  receiver,  and  the 
communication  bandwidth.  More  interestingly,  since  all  decision  procedures 
correspond  to  the  actual  execution  of  code  within  the  simulated  experiment  (i.e., 
decisionmaking  procedures  are  not  simulated),  the  decision  procedure  computation 
time  is  based  on  the  real  amount  of  time  it  takes  to  compute  the  decision.  Conse¬ 
quently,  simulated  time  is  delayed  by  the  appropriate  amounts  every  time  a  decision 
procedure  computation  is  made. 

6.4.3.  Number  and  Types  of  Experiments 

We  conducted  120  experiments.  Each  experiment  simulated  a  distributed  system 
comprised  of  30  machines,  and  used  a  different  combination  of  30  traces,  selected  from 
a  total  set  of  300.  Each  experiment  consisted  of  a  set  of  simulations,  one  for  each 
strategy  [P],  [U]  (which  included  a  separate  simulation  for  each  update  period),  [I], 
[B],  and  [R].  To  limit  variations  in  load  distributions  between  experiments,  traces 
were  selected  such  that  the  sum  of  their  average  loads  (see  Section  6.1.5),  and  the 
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distribution  of  average  loads  across  machines,  were  virtually  the  same.  (Within  an 
experiment,  the  same  combination  of  traces  was  used  for  each  simulation  of  a  different 
strategy.)  On  the  average,  approximately  65000  jobs  were  executed  per  experiment. 
All  the  results  to  be  presented  consist  of  statistics  of  performance  measures  based  on 
all  the  experiments. 

6.4.4.  Results 

The  first  experimental  result  we  present  addresses  how  each  strategy  performs  in 
terms  of  the  average  job  CPU  queueing  delay  time.  Job  CPU  queueing  delay  is  the 
total  time  a  job  is  delayed  due  to  CPU  contention  with  other  jobs.  If  the  strategy 
makes  good  load  balancing  decisions,  the  average  CPU  queue  should  be  kept  low 
because  jobs  will  be  spread  more  evenly  over  the  various  machines. 

Figure  6.17  contains  a  graph  showing  the  improvement  in  average  job  delay 
versus  the  information  update  time  for  each  strategy.  The  other  graphs  we  will 
present  will  be  of  this  type.  In  general,  they  will  show  some  performance  index  on  the 
vertical  axis  (typically  expressed  as  a  percentage  of  the  optimal  value;  the  baseline 
value  will  correspond  to  0%),  and  the  information  update  period  on  the  horizontal 
axis,  which  is  scaled  logarithmically.  Points  (with  lines  connecting  them)  for  strategy 
[U]  will  be  plotted  for  each  update  period.  Horizontal  lines  for  strategies  [P],  [I],  [B], 
and  [R],  all  of  which  are  independent  of  the  horizontal  axis  of  the  graphs,  will  be 
shown  for  comparison.  Each  point  (for  (U ] )  or  horizontal  line  (for  [P],  [I],  [B],  and 
[R] )  represents  the  mean  value  statistic  of  the  performance  index,  averaged  ever  120 


runs. 
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For  the  graph  in  Figure  6.17,  a  100%  improvement  in  delay  corresponds  to  no 
delay,  0%  corresponds  to  the  average  job  delay  for  jobs  in  the  baseline  experiments 
using  strategy  [B],  In  absolute  terms,  each  percentage  point  corresponds  to  an 
approximate  reduction  of  17.5  milliseconds  in  CPU  queueing  delay  for  every  job. 

Strategy  [U]  has  poor  performance  for  very  small  and  very  large  update  periods. 
For  very  small  update  periods,  communication  overhead  costs  are  very  high,  for  \ery 
large  update  periods,  degradation  due  to  decisions  based  on  stale  state  information  is 
high.  For  the  optimal  update  period,  which  is  15  seconds,  [U]  provides  for 
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approximately  a  49.1%  improvement  in  delay. 


Percentage  Improvement  in 

Average  CPU  Queueing  Delay 

%  Improvement 

Strategy 

.05 

.95 

quantile 

mean 

quantile 

[I]  Intelligent  Agents 

67.6 

67.7 

67.8 

[P]  Perfect  Information 

55.1 

63.8 

68.1 

[U]  Periodic  Update 

38.3 

49.1 

53.2 

[R]  Random  Placement 

46.8 

47.5 

47.9 

[B]  No  Load  Balancing 

0.0 

0.0 

0.0 

Table  6.1. 


Table  6.1  summarizes  the  results  for  each  strategy.  Strategy  [P],  which  is  based 
on  perfect  state  information,  yields  a  63.8%  improvement  in  delay,  while  strategy  [I]  is 
slightly  better  at  67.7%.  What  is  noticeably  different  between  [P]  and  [I]  is  the 
difference  in  the  variation  of  the  improvement,  indicated  by  the  width  of  the  interval 
between  the  .05  and  .95  quantiles.  [P]  has  a  much  wider  variation  than  [I].  In  partic¬ 
ular,  [P]’s  .05  quantile  is  significantly  lower  than  the  mean,  where  [I]  does  not  exhibit 
this  problem.  We  believe  that  this  is  due  to  [I]’s  mechanism  for  avoiding  resonances, 
which  is  not  present  in  [P],  as  this  is  the  only  advantage  of  [I]  over  [P].  Strategy  [R] 
does  quite  well  relative  to  [U],  with  a  47.5  mean  improvement  in  delay,  although  this 
is  significantly  lower  than  those  of  [P]  or  [I].  Notice  that  the  statistical  variation  for 
[R]  is  small,  like  that  for  [I],  suggesting  again  that  randomization  goes  far  in  avoiding 
resonances. 

Although  the  average  job  CPU  queueing  delay  is  an  important  performance 
index,  minimizing  it  does  not  necessarily  insure  that  the  average  job  elapsed  time  is 
minimized.  This  manifests  itself  in  load  balancing  systems  where  job  transfer  time  is 
a  significant  cost.  Consequently,  although  offloading  of  jobs  to  balance  the  load 
causes  CPU  queues  to  be  shorter  on  the  average,  the  delay  a  job  experiences  during  its 
transfer  to  other  machines  will  increase  its  overall  elapsed  time. 
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Figure  6.18  is  a  graph  of  the  improvement  in  average  elapsed  time  versus  the 
information  update  period,  for  each  strategy.  100%  improvement  in  average  elapsed 
time  means  that  there  are  no  delays  due  to  CPU  queueing,  network  queueing,  or  net¬ 
work  transmission  -  Thus,  a  100%  improvement  corresponds  to  the  minimum  elapsed 
times  jobs  could  experience.  0%  corresponds  to  the  average  job  elapsed  time  in  the 
baseline  experiments  using  strategy  [B] .  In  absolute  terms,  each  percentage  point 
corresponds  to  an  approximate  reduction  of  228  milliseconds  in  elapsed  time  for  every 
job.  Table  6.2  summarizes  the  results. 
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Percentage  Improvement  in 
Average  Elapsed  Time 


%  Improvement 

Strategy 

.05 

quantile 

mean 

.95 

quantile 

[I]  Intelligent  Agents 

67.0 

67.1 

67.2 

[P]  Perfect  Information 

52.8 

62.7 

67.6 

[U]  Periodic  Update 

37.3 

48.1 

51.7 

[R]  Random  Placement 

43.9 

44.3 

44.8 

[B]  No  Load  Balancing 

0.0 

0.0 

0.0 

Table  6.2. 


As  one  might  expect,  the  results  are  very  similar  to  those  for  the  average  CPU 
queueing  delay  improvement  except  for  the  degradation  under  [R],  which  is  more 
significant.  We  see  again  that,  under  [U],  the  performance  is  poor  for  very  small  and 
very  large  update  periods,  with  the  best  period  of  15  seconds  offering  a  48.1  /o 
improvement.  This  is  1.9%  less  than  the  CPU  queueing  delay  improvement,  suggest¬ 
ing  that  [U]  is  less  sensitive  to  the  cost  of  network  transmissions  and  delays  than  it  is 
to  CPU  queueing  delays  (which  is  true).  Under  [R],  there  is  a  44.3%  improvement, 
which  is  3.2%  less  than  the  CPU  queueing  delay  improvement.  This  is  expected  since, 
under  [R],  machines  are  not  selected  on  the  basis  of  distance,  or  on  any  other  cri¬ 
terion;  they  are  simply  selected  randomly. 

Again,  the  best  mean  improvements  occur  under  [I]  (67.1%),  and  under  [P] 
(62.7%).  In  particular,  the  difference  between  the  improvement  under  [I]  and  the 
CPU  queueing  delay  improvement  under  [I]  is  0.5%;  thus,  [I]  seems  to  take  better  into 
account  network  transmission  delays  than  the  other  strategies. 
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Figure  6.19  exhibits  a  graph  showing  the  percentage  of  time  a  CPU  spends  com¬ 
municating  state  information  and  computing  decision  procedures.  Table  6.3  summar¬ 
izes  the  results.  Under  strategy  [U]  with  an  update  period  of  1  second,  ever}'  machine 
spends  an  average  of  49.6%  of  its  time  for  overhead,  mostly  due  to  communication. 
As  the  update  period  increases,  the  overhead  is  reduced,  as  expected.  For  an  update 
period  of  60  seconds,  the  overhead  is  about  4.0%.  For  larger  update  periods,  overhead 
rises  a  bit  due  to  unbalanced  load  distributions,  which  cause  some  machines  to  have 
high  loads  and  consequently  to  spend  more  time  for  local  job  scheduling  (see  Section 

6.1.4). 
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Percentage  Time  Spent 
for  Overhead 

%  Time  Spent 

Strategy 

.05 

quantile 

Mean 

.95 

quantile 

[I]  Intelligent  Agents 

3.97 

3.98 

3.98 

[P]  Perfect  Information 

2.85 

2.88 

3.01 

[U]  Periodic  Update 

4.00 

4.04 

4.31 

[Rj  Random  Placement 

3.12 

3.14 

3.17 

[B]  No  Load  Balancing 

3.70 

3.70 

3.71 

Table  6.3. 


For  strategies  [R],  [P],  [I],  and  [B],  overhead  is  less  than  [U]  in  all  cases.  This  is 
not  surprising  for  [B]  and  [R],  which  only  have  local  job  scheduling  overhead,  and  for 
[P],  which  also  has  to  compute  its  not  very  complex  decision  procedure.  What  is 
surprising  is  that,  although  the  overhead  for  [I]  is  higher  than  for  all  the  other  stra¬ 
tegies  except  [U],  it  is  still  very  low  in  absolute  terms.  Empirical  evidence  seems  to 
suggest  that  intelligent  decentralized  control  is  feasible  at  low  cost. 

Why  does  our  strategy  [I]  do  so  well,  and  yet  impose  so  little  overhead?  One  of 
the  hypotheses  we  made  at  the  outset  was  that,  if  decisions  could  be  based  on  infor¬ 
mation  wrhose  reliability  could  be  quantified,  and  if  communication  costs  could  be 
kept  low  by  updating  information  only  when  necessary,  we  would  achieve  our  goal.  It 
is  clear  that  good  decisions  are  being  made  under  [I],  as  it  surpasses  all  the  other  stra¬ 
tegies  in  optimizing  the  performance  index.  What  can  be  said  about  the  frequency 
with  which  state  information  is  updated? 
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Percentage 


of 


Messages 


Inter- update  Period 


Figure  6.20.  Distribution  of  Update  Periods. 


Figure  6.20  shows  the  histogram  of  update  periods  between  agents  under  strategy 
[I],  It  shows  that  update  periods  under  approximately  30  seconds  rarely  take  place. 
Since,  the  optimal  communication  period  under  [U]  was  approximately  15  seconds, 
this  suggests  that  the  quantification  of  information  uncertainty  and  its  integration 
into  decisionmaking  using  conditional  expected  utility  (all  of  which  are  lacking  under 
[XJ])  can  dramatically  improve  decisions.  The  average  update  period  given  by  the  his¬ 
togram  in  Figure  6.20  is  80.16  seconds!  This  is  a  dramatic  illustration  that  the  princi¬ 
ple  of  frugal  state  information  communication,  presented  in  Section  4.6,  is  of  critical 
importance  in  reducing  communication  overhead,  which  can  be  a  significant  cost  in 
distributed  systems.  In  particular,  as  the  systems  get  larger,  these  effects  are 
magnified. 
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SUMMARY  AND  CONCLUSIONS 


In  this  chapter,  we  summarize  the  main  points  of  this  dissertation  and  we  present 
the  major  conclusions. 

7.1.  Summary 

In  Chapter  3,  we  presented  a  formal  model  for  decentralized  control,  and  showed 
that  there  are  two  fundamental  problems: 

1.  No  agent  can  know  with  certainty  the  current  global  state. 

2.  No  agent  can  know  with  certainty  the  current  actions  of  remote 
agents. 

In  Chapter  4,  we  presented  a  set  of  principles  for  constructing  approximate  solu¬ 
tions. 

•  Adopt  a  knowledge-based  solution:  incorporate  all  special-case  knowledge 
about  the  problem  as  an  integral  part  of  the  decisionmaking  process. 

•  Apply  knowledge  abstraction:  summarize  information  into  a  form  which  can 
be  utilized  and  communicated  more  efficiently. 

•  Quantify  uncertainty:  explicitly  account  for  information  uncertainty  in 
decisionmaking. 

•  Use  directional  heuristics:  select  decisions  based  on  their  tendencies  to 
increase  utility. 

•  Integrate  information  aging  in  decisionmaking:  condition  expected  state 
utility  on  the  age  of  information. 

•  Communicate  frugally:  communicate  only  when  the  cost  of  the  consequences 
of  using  out-of-date  information  in  decisionmaking  exceeds  cost  of  communica¬ 
tion  overhead. 

•  Avoid  resonances  using  SPACE/TIME  randomization:  randomize  o\er 
the  space  of  good  decisions  and  over  the  time  during  which  these  decisions  can  be 
made  to  avoid  mutually  conflicting  decisions  between  agents. 

Our  goal  has  been  to  show  that,  despite  the  formidable  nature  of  the  two  funda¬ 
mental  problems  of  decentralized  control,  the  techniques  described  in  this  dissertation 
can  provide  acceptable  approximate  solutions  to  them.  This  was  demonstrated  in 
Chapter  5  by  the  effective  application  of  the  techniques  to  the  general  problem  of 
decentralized  load  balancing.  The  main  results,  presented  in  Chapter  6,  were  that 
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agents  can  make  good  decisions  (measured  by  a  marked  increase  m  system  per  or- 
mance)  which  do  not  mutually  conflict  even  though  they  use  uncertain  state  informa¬ 
tion.  In  particular,  frequent  communication  was  found  to  be  unnecessary. 

7.2.  Conclusions 

The  major  conclusions  we  draw  from  this  research  are  the  following  ones. 

Correct  abstract  state  design  is  crucial  to  efficient  decentralized  control. 

The  design  of  the  abstract  state  space  has  an  underlying  effect  on  all  aspects  of  distri¬ 
buted  decisionmaking.  The  abstract  state  space  should  have  a  strong  correspondence 
to  the  low-level  state  space  partition  imposed  by  the  decision  space  so  that  decisions 
can  be  selected  correctly  and  quickly.  It  should  be  small,  to  minimize  the  storage 
required  for  global  state  information  and  reduce  the  number  of  terms  m  the  expected 
utility  and  payoff  computations.  If  the  abstract  space  is  a  good  one,  a  simple  model 
for  prediction  can  be  constructed  (e.g.,  in  the  form  of  a  Markov  state  transition 
model).  Such  a  model  will  have  slow  state  transition  rates  to  minimize  the  need  for 
remote  agents  to  receive  state  information  updates. 

Note  that  it  is  important  to  separate  measures  of  states  and  the  utility  of  states 
(i.e.,  the  state’s  value  and  the  state’s  utility  should  not  generally  be  one  and  the 
same).  As  the  state  space  design  is  influenced  by  the  construction  of  convenient  pred¬ 
iction  models  (e.g.,  state  transition  models),  and  the  utility  function  is  influenced  by 
the  performance  measure  to  be  optimized,  combining  these  influences  may  not  be  pos¬ 
sible.  Forcing  an  equivalence  between  state  value  and  state  utility  will  detract  from 
the  effectiveness  of  the  state  prediction  model,  or  the  effectiveness  of  the  performance 
optimization,  or  both.  Furthermore,  the  variation  of  the  expected  future  state  and 
the  expected  future  state  utility  as  a  function  of  aging  information  will  generally  e 

different. 

Qualifying  state  information  by  quantifying  its  uncertainty  improves 
decisionmaking. 

In  general,  an  agent  will  regard  each  item  of  information  about  the  state  of  remote 
agents  with  varying  degrees  of  uncertainty.  Basing  decisions  not  only  on  what  eac 
item  of  information  savs,  but  also  on  how  reliable  it  is,  can  have  a  dramatic  effect  on 
improving  the  quality  of  decisionmaking.  This  is  due  to  the  general  sensitivity  of  the 
decisions  made  by  a  large  number  of  agents  over  a  small  interval  of  time  to  a  rela¬ 
tively  small  number  of  items  of  information,  namely,  the  states  of  the  most  desira  e 
agents  (e.g.,  agents  which  have  a  large  capacity  for  work).  The  added  dimension  of 
reliability  of  information  allows  better  discrimination  of  agent  utilities. 

Formulating  expected  state  utility  as  a  function  of  aging  information  is 
valuable  for  correctly  evaluating  alternatives. 

Defining  the  utility  of  states  enables  us  to  use  decision  theory,  which  provides  a  for¬ 
malism  for  how  a  decisionmaker  can  evaluate  alternatives  m  a  statistically  optimal 
manner  when  the  underlying  information  is  uncertain.  Decisionmaking  is  significantly 
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improved  when  decisions  based  on  this  state  information  are  sensitive  to  the 
information’s  age.  The  effects  of  aging  information  can  be  incorporated  as  an  integral 
part  of  decisionmaking  by  formulating  the  expected  state  utility  as  a  function  of  the 
state  information’s  age.  This  is  important  for  distributed  decisionmaking  in  "verj  large 
distributed  systems  where  communication  costs  and  delays  are  significant,  and  there¬ 
fore,  different  items  of  information  will  have  varying  and  potentially  large  ages. 

State  information  communication  can  often  be  replaced  with  inferencing. 

By  making  inferences  based  on  past  information  and  predictive  models,  the  need  for 
communication  can  be  reduced  significantly.  In  effect,  communication  is  replaced  with 
local  computation,  which  is  a  desirable  goal  in  large  distributed  systems  of  cooperative 
agents.  This  is  greatly  dependent  on  the  rate  at  which  information  becomes  stale,  and 
how  well  this  is  accounted  for  by  the  decisionmaking  process. 

Space/time  randomization  is  an  effective  way  of  avoiding  resonances. 

The  advantage  of  space/time  randomization  is  that  it  is  a  cheap  decision  selection 
procedure  which  dramatically  reduces  the  possibility  of  mutually  conflicting  decisions. 
Therefore,  it  is  an  effective  solution  to  the  second  fundamental  problem  of  decentral¬ 
ized  control.  In  particular,  as  distributed  systems  become  larger  and  larger, 
space/time  randomization  becomes  more  and  more  valuable  as  it  avoids  reliance  on 
explicit  communication. 
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